Research Programming Artificial Intelligence Interviews Other

Serokell’s Work on GHC: Dependent Types, Part 4

Article by Vladislav Zavialov, Andrey Borzenkov

October 13th, 2024

17 min read

The GHC team at Serokell continues its work towards dependent types in Haskell. Our ultimate goal is that Haskell becomes a language where the use of advanced type system features is not just possible but practical.

In this report, Serokell’s GHC team members Vladislav Zavialov and Andrei Borzenkov describe their recent contributions.

Summary

Here’s a brief overview of the topics covered in this report:

Highlight: Wildcard binders in type declarations
Highlight: Type syntax in expressions
Type constructors in patterns
Extension stability and TypeAbstractions
Function arity in error messages
Operator names in type arguments
Refactoring the kind checker
Bonus section: Collaborations

Wildcard binders in type declarations

We take for granted that in function equations we can replace unused variable bindings with wildcards. There’s even a warning to remind us:

const a b = a
  -- warning: [GHC-40910] [-Wunused-matches]
  --    Defined but not used: ‘b’

const a _ = a
  -- no warning

This is also true for type abstractions in function equations, which we added in GHC 9.10 and covered in our previous report. Here’s an example:

{-# LANGUAGE TypeAbstractions #-}

reparse :: forall a b. (Show a, Read b) => a -> b
reparse @a @b x = read @b (show x)

In reparse, we see type abstractions @a @b on the left and a type application @b on the right. The type variable bound by @a is unused, so we can instead use a wildcard binder @_:

{-# LANGUAGE TypeAbstractions #-}

reparse :: forall a b. (Show a, Read b) => a -> b
reparse @_ @b x = read @b (show x)
--       ↑
--      wildcard binders + TypeAbstractions

So far so good. Both _ and @_ are valid in term-level function equations. And to further demonstrate how wildcards are a natural part of Haskell, here’s an example of their use in a type family equation:

type Flipconst :: k1 -> k2 -> k1
type family Flipconst a b where
  Flipconst @_ @k _ b = b :: k
--           ↑    ↑
--     wildcard binders + TypeFamilies

But what about type declarations? Can we use _ and @_ in type synonyms, data/newtype declarations, classes, and so on? Let’s see:

type Const a b = a
  -- no error

type Const a _ = a
  -- error: [GHC-77878]
  --   Unexpected type ‘_’
  --   In the type declaration for ‘Const’
  --   A type declaration should have form
  --     type Const a b = ...

Turns out, no. Wildcards are rejected in type declarations. And this comes as a surprise, since we’ve seen how they’re supported in pretty much any other context.

As part of our mission to achieve feature parity between term- and type-level sublanguages of Haskell, we decided to introduce wildcard binders in type declarations.

We started off with a proposal amendment: #641. At first, we proposed a recursive extension to the grammar of type variable binders:

tv_bndr   ::=
  |     a_tv_bndr         -- binder for a visible argument
  | '@' a_tv_bndr         -- binder for an invisible argument
             
a_tv_bndr ::=
  | tyvar                 -- variable
  | '_'                   -- wildcard
  | '(' p_tv_bndr ')'     -- parenthesized binder

p_tv_bndr ::=
  | a_tv_bndr             -- plain binder
  | a_tv_bndr '::' kind   -- binder with a kind annotation

Based on the feedback from the GHC Steering Committee, we decided to scale it back to a flat structure:

tv_bndr ::=
  |     a_tv_bndr               -- binder for a visible argument
  | '@' a_tv_bndr               -- binder for an invisible argument

a_tv_bndr ::=
  | tyvar                       -- variable
  | '_'                         -- wildcard
  | '(' tyvar '::' kind ')'     -- variable with a kind annotation
  | '(' '_'   '::' kind ')'     -- wildcard with a kind annotation

The simpler variant of the grammar was accepted and we moved on to the implementation.

The tracking ticket for this change is #23501, the corresponding merge request is !13203, and it was merged into master in commit 5745dbd3.

The new feature is documented in section “6.4.17.4. Wildcard Binders in Type Declarations” of the User’s Guide.

Unused type variable binders may now be replaced with wildcards:

type UConst a b = a   -- unused named binder `b`
type WConst a _ = a   -- wildcard binder

Just like a named binder, a wildcard binder _ can be:

plain: _
kinded: (_ :: k -> Type)
invisible, plain: @_
invisible, kinded: @(_ :: k -> Type)

Those new forms of binders are allowed to occur on the LHSs of data, newtype, type, class, and type/data family declarations:

data D _ = ...
newtype N _ = ...
type T _ = ...
class C _ where ...
type family F _
data family DF _

However, we choose to reject them in forall telescopes and type family result variable binders (the latter being part of the TypeFamilyDependencies extension):

type family Fd a = _    -- disallowed  (WildcardBndrInTyFamResultVar)
fn :: forall _. Int     -- disallowed  (WildcardBndrInForallTelescope)

We updated the GHC AST to accommodate the new forms of binders. However, the TH AST remains intact to avoid a breaking change. Instead, we generate fresh names to replace wildcard binders in the GHC→TH conversion.

To put a cherry on top of the cake, these AST modifications allowed us to permit wildcards in kind-polymorphic type variable binders in constructor patterns, e.g.:

fn (MkT @(_ :: forall k. k -> Type) _ _) = ...

This example would not previously type-check unless the bound variable was named.

Wildcard binders and type abstractions are, in principle, separate features. However, both are included in the TypeAbstractions extension for administrative convenience.

Try out this feature as soon as GHC 9.12 is released and let us know what you think!

Type syntax in expressions

Those of you who closely follow the latest developments in GHC have probably heard aboutRequiredTypeArguments – a new language extension that we added in GHC 9.10. If it doesn’t sound familiar, check out its User’s Guide section 6.4.18. Required type arguments.

The implementation that was shipped with GHC 9.10 is sufficiently complete to allow the following code:

{-# LANGUAGE RequiredTypeArguments #-}

import Type.Reflection

tshow :: forall a -> Typeable a => String
tshow a = show (typeRep @a)

t1 = tshow Bool
t2 = tshow (Maybe String)

However, there are still some parts of the specification that have not been implemented. For example, in GHC 9.10 it is not possible to use the function type syntax t1 -> t2 in an argument to tshow:

t3 = tshow (Int -> Bool)
  -- GHC 9.10: error [GHC-66228]
  --   View pattern in expression context: Int -> Bool

We continue to invest our resources towards a complete implementation, so this is fixed in GHC 9.12:

t3 = tshow (Int -> Bool)
  -- GHC 9.12: accepted, no error

The relevant section of the proposal is 7.1 Syntax that introduces the so-called types in terms:

function types a -> b, a ⊸ b, a %m -> b
constrained types ctx => t
forall-quantification forall tvs. t, forall tvs -> t

This syntax is usually only found in types. But, as our example above demonstrates, with RequiredTypeArguments one might want to use type syntax in expressions.

Types in terms extend the grammar of expressions to account for this:

exp ::=
  | 'type' ktype
  | infixexp2 :: ctype
  | infixexp2

infixexp2 ::=
       | infixexp
 (NEW) | infixexp      '->'  infixexp2
 (NEW) | infixexp mult '->'  infixexp2
 (NEW) | infixexp      '=>'  infixexp2
 (NEW) | 'forall' tv_bndrs '.'  infixexp2
 (NEW) | 'forall' tv_bndrs '->' infixexp2

This means that all syntax that was previously exclusive to types is now also available in expressions.

Since GHC’s parser reuses the grammar of expressions to parse patterns, one of the challenges we had to solve was to figure out how to make function types play nicely with view patterns. Both are written a -> b. Our solution is to use one parser production in both cases and then disambiguate based on the enabled extensions.

The solution works rather well, except for a small problem with precedence: how should we parse a -> b :: t? The options are:

(a -> b) :: t
a -> (b :: t)

The answer to this question depends on whether we’re talking about function types (option 1) or view patterns (option 2). And it’s not easy to conditionally control precedence with a parser generator (remember that GHC uses happy rather than parser combinators).

To deal with this discrepancy, we had to introduce some unpleasant logic in post-processing of the parsed AST. The long-term plan is to change the precedence of view patterns. This will allow us to remove the workaround. In the meantime, GHC 9.12 will ship with -Wview-pattern-signatures, a new warning that we introduced to facilitate the transition. Here’s an example:

f (even -> True :: Bool) = ()
-- warning: [GHC-00834] [-Wview-pattern-signatures]
--   • Found an unparenthesized pattern signature
--     on the RHS of a view pattern.
--   • This code might stop working in a future GHC release
--     due to a planned change to the precedence of view patterns,
--     unless the view function is an endofunction.
--   • Current parse: ‘even -> (True :: Bool)’
--     Future parse: ‘(even -> True) :: Bool’
--   Suggested fix: Parenthesize the RHS of the view pattern

As the warning text suggests, there exists a simple way to make the code forward-compatible: add explicit parentheses.

Moving on from the parser to the AST, we extended both the GHC AST and the TH AST with new constructors:

 Syntax        |  GHC AST  |  TH AST 
---------------+-----------+--------------
 a -> b        | HsFunArr  | ConE (->)
 a %m -> b     | HsFunArr  | ConE FUN
 ctx => t      | HsQual    | ConstrainedE
 forall a. t   | HsForAll  | ForallE
 forall a -> t | HsForAll  | ForallVisE

The most intriguing consequence of this addition is that HsExpr and HsType – the GHC AST representations of terms and types respectively – now almost entirely overlap. This presents a perfect occasion for unifying those representations.

Such a refactoring would not only enhance code sharing among compiler components, but also lay the groundwork for GHC’s planned support for dependent types. We created a ticket to track it, where the idea is described in more detail: #25121 “Unify HsExpr and HsType”.

Type constructors in patterns

In our efforts to make the implementation of RequiredTypeArguments more complete, we also updated name resolution in patterns. Here’s an example of code that (incorrectly) tries to match on a type:

{-# LANGUAGE RequiredTypeArguments #-}

f :: forall a -> a
f Int = 42
  -- GHC 9.10: error [GHC-76037]
  --   Not in scope: data constructor ‘Int’

The error message in GHC 9.10 is technically correct, as there is indeed no data constructor Int in scope. However, it is not at all helpful because it fails to guess the intent of the programmer, who surely meant the type constructor Int.

Per GHC Proposal #281, section 7.2 Name resolution, the compiler needs to look up the constructor in both namespaces before it gives up and reports that it’s not in scope.

In GHC 9.12, name resolution succeeds. The error message now comes from the type checker, which is able to describe the issue much more accurately:

f :: forall a -> a
f Int = 42
  -- GHC 9.12 error: [GHC-25897]
  --   • Couldn't match expected type ‘a’ with actual type ‘Int’
  --     ‘a’ is a rigid type variable bound by the type signature for:
  --         f :: forall a -> a
  --   • In the pattern: Int
  --     In an equation for ‘f’: f Int = 42

Furthermore, now that type constructors don’t trip up name resolution, we are a bit closer to dependent types, which will allow matching on type constructors.

Extension stability and `TypeAbstractions`

There’s an ongoing effort to manage the evolution of language extensions, as described in GHC Proposal #601 “Extension Lifecycle Framework”.

The idea is to help Haskell programmers better navigate the vast landscape of Haskell features and make better informed choices about what to use in their libraries and applications and what to avoid.

The extensions that we recently added to the compiler, namely RequiredTypeArguments and TypeApplications, are still undergoing active development. So we made the choice to mark both of them as experimental.

We also added per-section “since”-annotations in the User’s Guide to clearly communicate how those features have been evolving over time. For example, type abstractions are available in patterns since 9.2, in type declarations since 9.8, and in function equations since 9.10.

The gradual rollout of language features allows early adopters to try out the newly added functionality with ease, while those who seek stability may prefer to wait until all known issues are resolved.

Function arity in error messages

The notion of function arity is not as straightforward as one might think. Arity is the number of arguments that a function expects. In a curried language like Haskell, there is more than one way to count those arguments.

The evaluation-focused notion of arity is the number of value arguments that need to be supplied before evaluation can take place:

Int                  -- has arity == 0
Int -> Int           -- has arity <= 1
Int -> Bool -> Int   -- has arity <= 2

We write <= rather than == as sometimes evaluation can occur before all value arguments are supplied, depending on the actual function definition.

This evaluation-focused notion of arity ignores type arguments, so:

forall a.   a            -- has arity == 0
forall a.   a -> a       -- has arity <= 1
forall a b. a -> b -> a  -- has arity <= 2

Class dictionaries, on the other hand, do count towards the arity in this conceptual framework, as they are passed at runtime:

forall a.   (Num a)        => a            has arity <= 1
forall a.   (Num a)        => a -> a       has arity <= 2
forall a b. (Num a, Ord b) => a -> b -> a  has arity <= 4

The syntactic notion of arity is the number of visible arguments, i.e. arguments that occur visibly in the source code. Let’s call it vis-arity so as not to confuse it with evaluation-focused arity.

In a function call f x y z, we can confidently say that f’s vis-arity >= 3, simply because we see three arguments x, y, z. We write >= rather than == as this could be a partial application.

At definition sites, we can acquire an underapproximation of vis-arity by counting the patterns on the LHS, e.g. f a b = rhs has vis-arity >= 2. The actual vis-arity can be higher if there is a lambda on the RHS, e.g. f a b = \c -> rhs.

If we look at the types, we can observe the following
- function arrows a -> b add to the vis-arity
- visible foralls forall a -> b add to the vis-arity
- constraint arrows a => b do not affect the vis-arity
- invisible foralls forall a. b do not affect the vis-arity
This means that forall-visibility matters for vis-arity (in contrast to evaluation-focused arity), while the type/value distinction is unimportant (again in contrast to the other notion of arity).

Examples:
```
Int                         -- vis-arity == 0   (no args)
Int -> Int                  -- vis-arity == 1   (1 funarg)
forall a. a -> a            -- vis-arity == 1   (1 funarg)
forall a. Num a => a -> a   -- vis-arity == 1   (1 funarg)
forall a -> Num a => a      -- vis-arity == 1   (1 req tyarg, 0 funargs)
forall a -> a -> a          -- vis-arity == 2   (1 req tyarg, 1 funarg)
Int -> forall a -> Int      -- vis-arity == 2   (1 funarg, 1 req tyarg)
```
With TypeApplications and TypeAbstractions, it is possible to visibly bind and pass invisible arguments, e.g. f @a x = ... or f @Int 42. Those @-prefixed arguments are ignored for the purposes of vis-arity.

Distinguishing arity and vis-arity has become particularly important with the introduction of RequiredTypeArguments. Ticket #24318 revealed that GHC did not do a good job of that.

We fixed this problem. The result is a subtle but important change to error messages:

 f :: forall a -> Bool
 f a b = True
   -- error: [GHC-83865]
   --  • Couldn't match expected type ‘Bool’ with actual  type ‘t0 -> Bool’
-  --  • The equation for ‘f’ has two value arguments,
+  --  • The equation for ‘f’ has two visible arguments,
   --    but its type ‘forall (a :: k) -> Bool’ has only one

Operator names in type arguments

Ticket #24570 identified a scoping issue that affected RequiredTypeArguments:

type (!@#) = Bool
f = idee (!@#)      -- Not in scope: ‘!@#’  (BUG)

Type operators differ from term operators in that they are lexically classified as (type) constructors, not as (type) variables. In the initial implementation of required type arguments, we did not account for this difference, hence the bogus “not in scope” error.

The problem is now fixed, and the example program is accepted.

Refactoring the kind checker

GHC’s type checker is bidirectional, meaning it can operate in one of two modes:

Checking mode is used when the expected type is known, e.g. the programmer wrote an explicit type signature
Inference mode is used when the type is unknown, e.g. the type signature is partial or not present

As the type checker recurses into subexpressions, it may switch from one mode to another. Naturally, we want to maximize code sharing between checking and inference. So instead of implementing checking and inference separately, there’s a data type to track the current mode:

data ExpType = Check TcType
             | Infer !InferResult
  -- InferResult is a mutable cell
  -- to be filled by the inferred type

However, the kind checker did not use this type, which meant a certain amount of code duplication.

We discovered that this duplication was the cause of #24299, where the TH module finalizers were ignored in inference mode. And by refactoring the kind checker to make use of ExpType, we killed two birds with one stone:

Fixed the bug with TH module finalizers
Made the kind checker and type checker more similar in structure, which will help with their eventual unification, a step towards dependent types

It is surely nice when a refactoring also happens to fix a bug.

Bonus section: Collaborations

In addition to championing certain features from an idea to a proposal to an implementation, we are also happy to join forces with other GHC contributors and do code review or help with resolving small issues.

Here are a few examples of that:

Krzysztof Gogolewski’s merge request !11036 “Fix quantification order for a `op` b and a %m -> b” was blocked because the change required a GHC proposal. We wrote the proposal and conducted an impact assessment: #640.
Brandon Chinn’s merge request !12089 “Implement multiline strings” involved changes to the lexer and parser. We reviewed the code and suggested a benchmark that pointed to potential performance improvements.
Ticket #22624 tracks the implementation of Modifiers. We are actively participating in the discussion of technical details.
Simon Peyton Jones’ proposal !675 “Add quiet-forall binders” proposes a new variant of forall binders. We participated in the discussion and suggested clarifications to the text of the proposal.
Jakob Brünker’s merge request !12776 “Type/data instances: require that the instantiation is determined by the LHS alone” builds upon our prior work in MR !10672. We are thankful that someone picked this up and we are excited to see further developments in this direction!
Simon Peyton Jones’ merge request !12319 “Just a start on specialising expressions” was blocked due to parsing conflicts. We identified the source of the conflicts, explained the necessary grammar changes, and pushed a fix.
Mario Blažević’s merge request !11853 “exportable named defaults” turned out to be a breaking change. We identified the problem and submitted a quick fix.