Functional Futures: Across the Kmettverse with Edward Kmett

Jonn Mostovoy
Article by Jonn Mostovoy
August 23rd, 2022
24 min read

In this month’s episode of Functional Futures, our guest is Edward Kmett – Head of Software Engineering at Groq and the author of many widely-used Haskell libraries.

In the episode, we dive into his life story – how he went from being broke and in debt to learning category theory and authoring numerous Haskell libraries. We also talk about programming languages and projects beyond Haskell that make Edward excited right now.

We hope you’ll enjoy the episode. As always, you can watch it on our Youtube channel or listen to the audio version.

Below are a few highlights of the episode, edited for clarity.

Highlights of the episode

Edward’s Haskell libraries

Jonn: How does it feel to be the author of so many libraries used everywhere from academia to industry?

Edward: It’s really weird. I never really planned on it – I mean, I sort of did. I came into the Haskell ecosystem and I looked around – at the time, it was just right after they released Cabal, so we had just got to a point where we had a package management ecosystem. Previously, there really wasn’t a good way to build on other people’s code – and I started just trying to write down all this category theory stuff that I was hearing about, mostly through the Haskell channel, and trying to make sense of it.

And I just kept building a bigger and bigger library worth of category theory stuff. I took over one that had previously been maintained by somebody else, and it just kept metastasizing, this larger and larger thing. And then Conall Elliott reached out to me and said: “Hey, I would really like to use comonads out of category extras, but there’s like 300 modules or something like that in there, and there’s no way I can bring myself to pick up that much of a dependency.”

Russell O’ Connor and some other folks were trying to convince me that maybe I should break this thing up into pieces. I did it, and somehow, I don’t know, it destroyed my productivity. This was the first of several such blows that I took.

I broke it into like 50 different packages, or 25, 30 different packages, or whatever, and I just couldn’t really make no rhyme or reason out of it. Eventually, I found a way to fuse some things together, and I got back into a productive workflow. Then, I added more and more packages, and it just became a bigger and bigger thing.

There were like two or three of these, like, oh my god, I was stretched to the breaking point, how am I holding all of these pieces together and dealing with all the people sending me patches on GitHub. I think there was a point at which I was the fifth most active user on all of GitHub, and the four people above me were companies. And it just became very bizarre.

There were two audiences when I really started doing this, like in the Haskell community. There were the academic folks. who weren’t being paid to maintain anything, and the industry folks, who’d maintain the wrong idea forever, because, gosh, they needed it. And there was nobody who would just write something down, take the stuff that was really low-hanging fruit, not academic, not publishable results, just get it from academia into practice and agree to maintain it for a decade or so. And now it’s a little over a decade later, and it’s largely been maintained. Now, it’s mostly on the back of other folks, Ryan Scott and those folks who just kind of keep the lights on when my attention gets distracted.

I will admit, it’s kind of weird, because I feel like it should eat more of my life right now than it does, but because of that long series of accidents which ended with a big cancer scare where I made sure I had a bunch of clearer successors for a bunch of projects – didn’t quite end there, but like that was where I think a large chunk of the downshifting in time consumption came in – I’m somehow able to hold down a day job and help out with other causes, and do things. It’s a little overwhelming at times, I will confess.

Jonn: When you have an abstraction that you really like and you know problems for, and you apply as a solution to problems you don’t know it will necessarily apply to – do you feel any emotions about people not using this abstraction and instead rolling their own or just not seeing it?

Edward: Let’s look at what happened with prisms as an example, within the lens library. We had some idea that there should be some kind of like inverse of a lens something that works with case matching or something, like, that we kind of figured it out, I’ve talked about it in previous podcasts and other forms and stuff like that, so I don’t really need to rehash that story, but it took us a long time to find the right theory for it, then define the right encoding of it. I think the first talk I gave on lenses used an earlier encoding of them, and then we found the right way to talk about it later.

But, then, you know, somebody wrote a library, I think it was called json-lens or lens-json, I don’t remember which one, whichever one’s the one that you would expect to reach for. And it tried to use lenses but it mangled the laws, and it really stuck in my teeth of, like, this isn’t really using lenses, this is just kind of trying to make the types fit, and it doesn’t work.

And so we sat down in the lens channel, and somebody else contributed the initial sketch of it, and we just wrote down, okay, well, JSON is a big sum type, let’s just write down all the cases and that was it, that was the entire freaking library design. And now all of a sudden lens went from being this thing that could work with state to this thing that would work with – well, if you happen to be building some kind of microservice, what do you work with, you take in some usually some CRUD pile of JSON or XML or whatever the heck it is, you smash it up using a bunch of state that happens to sit inside of a bunch of maps or whatever that are in your state, and then you spit out another big document or something like that. Maybe you interact with a database or two along the way.

This is the life of a CRUD developer, and the fact that all of those pieces – working with the records, working with the JSON that you’re reading, working with the XML that you’re emitting – all fit into the same vocabulary, I think went a long way towards lens getting as broad adoption in industry as it did. If it was just this cute little theoretical tool for how to access fields, I don’t think no one anyone would have cared.

And adding the other little trick that you don’t have to import lens in order to define lenses enabled it to sneak out into the ecosystem in a way that a traditional Haskell library couldn’t, at the expense of my email being filled with hate mail for writing non-Haskelly code for years. It finally settled down after people just accepted it as part of the ecosystem, but it did take a good long time.

Jonn: How did you achieve this sort of openness that something can be a lens, and then you can –

Edward: A lens or a traversal or whatever, because there’s no new type there in the way, right?

Jonn: Yeah.

Edward: Optics turned around and came along later, and Gundry and folks tried to make that more opaque. But then you don’t get the beautiful nature of being able to pun the dot from prelude as both function composition and field accessor, you don’t get a lot of the things that make lens pretty, you don’t get the clean interop between all of the different optics, even ones you didn’t even know about – they just become things that are given to you kind of by common sense properties of category theory.

Now, I did point out at the time we did lens that you could do profunctor-based versions of this and we had to adopt the profunctor-based encoding at least for prisms, but because we didn’t want folks to have to incur a dependency on the profunctors library or on the lens library in order to write lenses and traversals, we wound up encoding it with the sort of weird half mix of van Laarhoven lenses and profunctor lenses, which then let optics kind of come in and say “hey, look, we did the profunctor story.” Well, great, yeah, I’m glad that we told you how that could work. And then they stuck it in newtypes, and I lost interest.

I understand it gives slightly prettier error messages, there’s some things that you get out of it. But I think, the research vein of, you know – all research on optics stopped the moment things got fossilized into that. The notion that a lens to me is a co-functor, it has nothing to do with profunctors whatsoever really, is something that you would never see if you’re squinting at it through that lens, no pun intended.

Jonn: That’s actually a really, really insightful thing. Because a lot of us in the industry have this mindset of oh, okay, let’s wrap everything in a newtype as soon as we can, and then kind of make it into a black box, right? Maybe I have some fundamental properties, but I don’t care because I’m writing industrial code, and I just want to wrap everything in a newtype. So what you’re describing, I’m not sure if it applies to the industry, but it’s a beautiful case against newtype, I guess.

Edward: In this particular case – Haskell doesn’t really have subtyping, it sort of does, classes have a notion of subtyping between them, constraints have subtyping, you have like the implication in the category of constraints. And so the only way to get subtyping is you can say, well, you know, if I have a free variable a then I can instantiate it, it subsumes int or something like that, right, I can use that that way.

And the other way that I have that involves any sort of subtyping is I can say, well, if I need something that’s a foo that’s Eq I could pass it something that’s Ord, because Ord is a subclass of Eq. And those are the only two mechanisms that I have to really express subtyping where I can pass a thing like covariantly or contravariantly in a given position, either I can return it covariantly or take it contravariantly.

And so the lens trick: I let you basically pass me any function with any choice of function that you choose and then the constraints on that function are selected by the combinator you use it with. So you express the lens once, and then what happens is when you stitch them together the constraints get unified, right, or unioned to roll up to the biggest set of constraints that you’ve incurred.

So that just sort of naturally falls out that you get lens composition. Composing a lens with a traversal gets you a traversal, all that kind of stuff just works across the board, and it’s just exploiting the only notion of actual proper subtyping that we have.

Without it, you have to express some kind of lattice by hand, and then use your own machinery to kind of get that to roll up and ball up into the answer. When you go to view over a list of like a traversal or something like that, you wind up asking yourself for a monoid, for the summary or whatever that you’re producing, that judo flip move comes explicitly from lenses opening up their guts, and optics can’t express that pattern.

Now, you know, the folks behind optics would turn around and say that you know it’s a safer library because you don’t have these weird monoid constraints coming out of nowhere. But my primary goal for lens was to build a library for me, with warts and everything at all kind of included, in that I wanted a vocabulary I would still be using a few years later. My goal was in two years I still want to be using this library for fiddling with field access and stuff, and now we’re 10 years later, and it’s still going strong.

There’s a whole cottage industry of folks exploring lenses for category theory, for things that have absolutely nothing to do with our computer science uses, there’s folks using them for talking about derivatives of functions –

Jonn: Like, actual numerical derivatives?

Edward: Numerical derivatives, yes, you can actually view them almost like a lens, and if you actually take the proper cofunctor encoding of things, you can actually make that rigorous, and there’s a bunch of things that we can turn around and say. Now the problem is, of course, you can’t really talk about cofunctors in Haskell, and for those following along at home, cofunctors are not contravariant functors, despite what some people in the Haskell community may have abused terminology to refer to them as, they’re a different thing, and it took me a long time, but I do finally agree that that’s actually the right term.

How Edward learned programming

Jonn: In your talk “Stop treading water”, you mentioned that you weren’t born into a family of mathematicians and drawing arrows in the sand since you were three years old, right, you somehow got to the place where you can fit all these obstructions in your head.

Edward: Yeah, I had one uncle who was a computer scientist, not really a computer scientist, he was more like a self-taught like electrical engineer but I thought of him as a computer scientist, and so he kind of showed me some Basic 7 400 series TTL logic kind of stuff, and this led to some little robotics projects and stuff like that, that caused a bit of a house fire when I was a kid, which then forbade me from soldering.

And so that’s why I kind of became a software person. I’d been going down this road and then lit the house on fire, and then mom said no. And so after that I stuck to not opening up my computers and I just kept programming. But again, until like third grade I was in downtown Detroit, and my mom didn’t really want me going outside and playing with the kids in the neighborhood, it wasn’t terribly safe. She was happy to have me plugging away on the computer, and then we moved up to the suburbs, and I had transmogrified into the computer kid or whatever.

And so in third grade I lied and told the kid that I had written a disassembler for the Commodore 64, and then he didn’t believe me on a Friday, and so I went home and hacked it together and actually made it sort of work and slapped down a disassembler of the Commodore 64 Basic ROM on his desk, and he believed me, which is good because it was totally freaking wrong, and I spent the rest of that year trying to make that thing right.

And that’s how I learned a program, through writing this thing almost on a dare over a weekend, and then using that Basic ROM or whatever as my bible for how people must write code. And I didn’t realize that this thing had been kind of like crapped by Bill Gates himself back in the day and then optimized and optimized to fit in this Commodore, where they were just at the edge of things, and so like the last byte of the sign table was the first byte of the next function and stuff like that. I just thought that was the way everyone programmed.

Again, a lot of my most outstanding successes have come from misunderstanding what is expected or what other people do. When I came into the Haskell community, I assumed the entire Haskell community was full of category theorists, because I was sitting on the Haskell IRC channel and Cale Gibbard was there using category theory I didn’t know to solve computer science problems I didn’t know, and I’d just done a masters in each of these freaking things. It was really frustrating to me, because I just hadn’t had the context, so I just assumed this entire community was made out of folks like Cale. And little did I know that all he did was read papers all day. Now he actually is gainfully employed in doing stuff, but I’ll never lure him away from Ryan.

How Edward learned category theory

Jonn: I think that the most applicable thing, hopefully even actionable for many people from our own audience would be to maybe not even learn how did you go from hacking on Basic to drawing arrows, but more like when did you do that, and like how is it possible for example to to be like a full-time engineer writing in Python or something like that and at the same time kind of build up intuitions or maybe build up some hard skills. Because doing, for example, category theory and encoding stuff, it requires hard skills as well, it’s not just about understanding.

Edward: So this is where I think the answer that everyone wants to hear is the one that says, oh, gosh, you could totally do this if you just worked harder, and I don’t know that that is necessarily always just true. There’s a lot of luck, and a lot of picking the right problems, and a lot of deliberate almost sort of defocusing on what nominally should have been my priorities at given times that went into getting me to where I am and what I do.

Some of it is calamities that happened in my own personal life. Like, I went off and made and lost a lot of money trying to build a phone company / ISP in the late 90s through early 2000s, and I wound up rather crushingly in debt at the end of it, and so I spent basically a year on my now wife’s couch doing nothing. And she eventually started circling help wanted ads in the paper and just subtly leaving them there, you know.

So I went and decided I was going to go back – so I’d done one semester with a college right out of high school, left because Ford offered me what I thought was a lot of money, wound up suing Ford over a bunch of other stuff, and wound up taking that money and helping to fund the phone company, which is its own drama. But the end of that was this, hey, look, now I’m broke and I’ve also had to fire all my friends, I don’t have a social circle, which didn’t end up so well for me.

But I had a work ethic and I had the fact that I had been doing graphics and some game dev programming and whatever during that time while running the phone company. And it was really just – I didn’t want to turn into a suit and I really loved doing the graphics stuff. I found my way into this community called the demo scene, and they had all sorts of neat data structures and algorithms, and so I learned data structures and algorithms kind of from first principles in order to try and figure out how to fit things into demos or help other people fit things into their demos.

And a large chunk of that was the visual payout of seeing something render. I could get a very tangible result out of learning some linear algebra or a little bit of DFQ, or just playing around with what is inverse kinematics, really – oh that’s what it’s for – and then I can get the thing to actually grip the wall or whatever I’m trying to do.

Those pieces felt really real to me, and so when I got the opportunity to go back and collect degrees – I’d been a CTO for several years at that point in time, no one wanted to hire a CTO who didn’t have a formal education, and just doing that bender through academia helped me fix the resume imbalance.

This was right after the dot-com crash and then the sort of general internet crash that came afterward, the ISP crash that came in like a year or so afterwards. And no one was going to hire me after that. Like, I wouldn’t believe I’d stay for anything low-level, and no one wanted to take a risk on a CTO-level kind of person who didn’t have a formal education, who couldn’t immediately point to their last thing as a success story in the desert of nobody having a success story.

So that’s when I went off and kind of hid in academia and collected a bunch of degrees, and the problem there was I just didn’t understand the game. I did my undergrad that year, I did masters of math the next semester, but I didn’t publish anything, I didn’t really get into what it takes to be a proper academic.

I was getting ready to go do a PhD, maybe over at Harvard or whatever with Greg Morrisett and the group that was just finishing up there. And that was going to move me to Boston. I got an offer to do some defense contracting out in Boston, so that’s how I got to Boston, and then the Morrissett’s group over at Harvard wound up going a different direction than I was terribly interested in, so that application process kind of died on the vine. And I just did defense contracting, and then finance, and what have you.

But so: where did the math come from? Part of it was that I had this ability to just stop everything and go off and do an incredibly intensive academic bender for three or four years what when I was 29-ish, somewhere in late 20s, I got to just stop, disconnect from the world for three or four years, and add a pile of student loans (because I’m American) to my already nondischargeable debt because I still owed things through the bankruptcy.

And then somehow I emerged from the other side reasonably sane and having found Haskell, because I didn’t find Haskell until the very end of that process, but having done this weird Frankenstein math degree, which was a little bit of everything that was offered by the university, I could make it through Mac Lane, which was really the only real inroads into category theory that anyone had at the time.

Now there’s much kinder, gentler introductions like Emily Riehl and folks have and the HoTT book and all that kind of stuff provide – there’s a bunch of better references now than there were at the time I was trying to learn category theory. I still can’t say that I do category theory terribly well, I speak with a very strong Haskell accent. Like, folks at The Topos Institute almost make fun of me on it, like they’re very kind about it, but I see it as such.

So I don’t know if that directly answers your question, but how did I get there and how did I do it – there’s really a series of these fortunate coincidences, these opportunities that were put in front of me that I kind of ran away with or abused rather, than something that’s like, gosh, here’s a good piece of general actionable advice that everyone in your audience can go off and take advantage of. Just find four years where you don’t need to be doing anything.

Exciting stuff beyond Haskell

Jonn: If you were to discover something today that’s not Haskell, what would you discover and be excited about, language-wise?

Edward: Yeah, actually, I enjoy Rust, not for cargo culting, going off and writing Rust because everyone’s writing Rust, but because if I look around in the Rust ecosystem, there’s a number of things that they pay attention to that Haskell doesn’t. And Haskell can’t because of some of the restrictions on the nature of the different languages.

I adore the fact that Rust variants are careful about tracking “niches” in their data types: if you have a reference to a thing, and you take an Option of that reference, the fact that references can’t be null means that the compiler is smart enough to figure out that, oh, I can jam the option of a reference into the same space as the original reference and just take over the zero case.

And so the abstraction of putting a Maybe there doesn’t have a cost for that data type. And the nice thing about it is this works in general for small variants. So if I take an Option of a thing that has three different cases, I just get something that has four cases in it. Then I take an Option of another thing then, oh my gosh, i’ve got five cases. And it’s just still cramming it into the same byte or whatever it was going to use to store that data. And that’s adorable to me, the fact that it just does that and it doesn’t put any preamble or prologue on that, it just happens, is amazing.

When I when I think about building a compiler, there’s a talk by Anders Hejlsberg and folks about what you need to build a modern compiler architecture, and it’s really about building sort of like query-based structures, so you can ask it questions and it can recompute. In the Rust ecosystem, the Rust compiler is built on top of something called salsa, which is this project from Niko Matsakis, which is very similar to a little Haskell library called Watch, which implements much the same idea, and Olle Fredriksson and Haskell have something called Make which also implements the same idea, and we have Shake in the Haskell ecosystem, which implements much the same idea.

Each one of these things is sort of the same idea being groped at from different directions, but I think they really exploit Rust’s macro system to make it just pleasant to use. It just feels very Rusty to write code to use this sort of abstraction layer that is dealing with all of the lazy recomputation in the presence of changed inputs.

And those kinds of things. Like if i’m trying to write a compiler in Rust, well, what would I do? I’d probably take a tree sitter, like, build the thing to lazily recompute as my syntax tree changes, use the fact that I’ve got tower-lsp to directly build the language server protocol (absolutely could use the lsp package to do that in Haskell, but it’s still another thing) and then I would use something like salsa to manage the internals of it.

And I’m just focusing on the pieces that are left over that I care about, all of the boilerplate has been taken away from me, and the fact that those libraries largely fit together in that ecosystem really appeals. I realize it’s a very long-winded answer to a throwaway, quick, where would I have to go to get excited. So that’s my workman-like answer, if I put on my industrial Haskell hat, that’s my answer.

If I put on my more theoretical Haskell hat, the main direction I was kind of going when I was working at MIRI was around, like, how do I make functional programming scale. And my sort of answer is how do I make a dependently typed language that takes type classes as Haskell calls type classes seriously, in the sense of my type classes versus the world rant, which is another talk that I’ve given at some point in time that people love or hate. Apparently, it somehow indirectly led to Rust’s traits, so I’m very happy that I gave the rant. But why do I care about that?

So I was working on this programming language project called coda, which is how do I make dependently type programming that takes type classes seriously, where they actually mean what Haskell programmers say when they think about type classes, that the category constraints is then – at least like the way I think of it. Or, like, basically [inaudible] theories or essentially algebraic theories, or whatever you want to call them in this particular framework.

And if I start going there, I start heading down towards like an extensional type theory that is mostly syntax directed, but where I can offer a little bit of help, and where each one of those residual problems that I get out of the type checking process becomes something that I sort of discharge almost through like a Coq obligation tactic or something like that.

That’s an awful lot of words, but the point of it is how do I make something that like feels like Haskell to program in, and that I could then incrementally refine until I get the confidence that I can have in an Agda- or a Coq-style program, but not stop the user at every step along the way, making sure that they’ve juggled getting the types right and all this kind of stuff in just the right order in order to build the perfect program all in one go.

Because you just have to hold too many things in your head. It’s, like, in Rust, I have to fight with the borrow checker all the damn time, even before I can even start to think like the thoughts of my algorithm. Now, eventually you get used to wearing that straightjacket, it doesn’t confine you, you know, you just don’t move the wrong way.

In Agda and in Coq, you get used to it as well. But how can I make the sort of big moves that I can make in Haskell and still eventually get that confidence that I really want?

Jonn: What you’re describing is something that – so Lean 4 has functional dependencies, right?

Edward: Lean gets so close to a lot of what I want. And we could go into a very long-detailed rant about what’s good and what’s bad about Lean.

How do category theorists name things?

Jonn: I asked my colleagues if they have anything they would like to ask Edward, and one question kind of stood out to me as something marginal but funny. The question was: how do category theorists pick names for stuff and, additionally, do they still name things? I think that the latter bit is funnier, right? Like, we’re scared of more things to learn.

Edward: There aren’t like – category theorists will say, gosh, that’s Cartesian whatever, blah, blah, blah, you got a handful of names that have largely been selected. Then you go off into set theory, and set theorists will just call something a mouse. Yeah, it’s just a mouse. Why? Because it’s a mouse.

Like, you know, category theorists are not the most egregious of namers or something like that here. Was it Mei who named operads and monads? Like, they used to be called standard triples, that’s what a monad used to be or whatever back in the day. And then Mei (I might be wrong) suggested that because like, I guess, apocryphally, because his mother was an opera singer, he named operads operads, and then like monads so they would go with operads, therefore like this thing about operations would very nicely coincide with this thing on general operation. So there’s some naming that’s going on there, there’s always an art to it, obviously, but I don’t know.

A lot of it is – Grothendieck named everything, and so you just use his names, and maybe sometimes mangle some things in French. Distributors, profunctors, modules – there’s like a handful of different names for profunctors as an example.


Thanks to Edward for being a guest on our podcast!

If you want to hear more from Edward, you can follow him on Twitter.

To hear more from us, subscribe to Functional Futures on YouTube or your favorite podcasting platform.

Functional Futures: Across the Kmettverse with Edward Kmett
Banner that links to Serokell Shop. You can buy cool FP T-shirts there!
More from Serokell
haskell dependency analysis imagehaskell dependency analysis image
template haskelltemplate haskell
Nix development tool thumbnailNix development tool thumbnail