1ML for non-specialists: introduction
This is the first entry in what I intend as a multi-part series.
1ML is a type system designed by Andreas Rossberg and described in a collection of papers by him. When it comes to integrating an ML-style module system into a new language, I think 1ML currently provides the best overall story (although it’s not perfect).
Unfortunately, Rossberg’s papers are pretty technical. He is a good writer, but I feel that right now there is still a communication barrier between academics who are in a position to discuss 1ML in depth and people who are in a position to write new compilers. I see a lot more discussion online that merely references 1ML than I do discussion that seems to deeply understand it.
I’ve been studying 1ML for the last several months, and working on my own prototype implementation (which unfortunately isn’t yet in a publishable state). In an effort to help these ideas percolate down towards mainstream adoption, I want to help demystify some things that I found confusing.
This series will not substitute for reading the paper, but I hope it can be a useful companion. I’ll be working from the version of the paper published in the Journal of Functional Programming in 2018. It’ll be very helpful to be able to look at that paper as you read this. In fact, you can press this button to pull it up on the side in this page: (It probably doesn’t look great on mobile, unfortunately.)
Prerequisite knowledge
In writing this, I assume the reader:
- Has some familiarity with inference rule notation and how type systems are defined using a combination of grammars and inference rules.
- Has some familiarity with the module system of OCaml or Standard ML, from a user’s perspective (not necessarily knowing their internals).
- Has familiarity with 1ML’s design goals as outlined in the paper introduction.
- Has not necessarily read the prequel, F-ing modules, Rossberg’s earlier paper on compiling module calculi to Fω.
Goals
By the end of this series of articles, the reader will have the tools they need to understand every single inference rule, including those of the type inference algorithm described in Section 7.
I also want to discuss some implementation concerns that are not mentioned in the paper (such as how to integrate type inference with levels).
Non-goals
A decent portion of the paper is dedicated to meta-theory (e.g. the proofs that the type system is sound, etc.). I have not read these sections in depth, and will not be discussing them here. It is reasonable for compiler authors to take these results as given and not focus on their details.
In some places, I will be criticizing the way the paper presents its ideas. Please do not take these as prescriptions for how it should have been written. I do not have the standing to make prescriptions like that; the paper was written with considerations other than the ones most important to me (such as brevity of the rules, ease of meta-theory, & an intended audience of other academics).
System Fω and the architecture of the type system
At a basic level, 1ML is really just a sophisticated syntactic sugar for a version of the lambda calculus called System Fω. If you’re not familiar, this is basically the core type system of Haskell, including higher-kinded type constructors, with all polymorphism being explicit. 1ML terms (I’ll use term and expression interchangeably) are converted into Fω terms and 1ML types are converted into Fω types.
The specific structure of this translation is best saved for a later article. For now, I’ll say that whenever a type appears in the surface syntax, it is immediately converted to a Fω type written with the variable . After this translation, the rules completely forget about and don’t touch it again. This is why we see similar constructs written with two different syntaxes, like and .
This may seem a little unfamiliar; I was used to seeing type systems that don’t distinguish between the surface syntax for types and the internal representation of types. But of course, a routine like this exists in any real-world type checker, it’s just that many papers would consider it “out of scope” and not discuss it.
Technical note: order of record fields
Whenever a type system defines a class of objects , it will often come with some invariants (“every has property ”) and some equivalences (“any are equivalent if , even if they aren’t the same as written”). In situations like this, it is useful to pay attention to which properties and equivalences are being enforced through the grammar of , and which are defined using the inference rules.
Unfortunately, when it comes to how Fω’s record field labels work, the paper isn’t so clear. Fig. 3, which defines the syntax of Fω, has only this to say:
(See below for an explanation of the overline notation.)
However, Fig. 4, which provides typing rules, has this rule for typing field access:
This rule only makes sense if we understand record fields to be intrinsically unordered. I think we can also reasonably assume that is understood to contain no duplicates. That resolves our question; however, for reasons I’ll get into later, I suspect a type checker will want to keep around the original order of field labels in practice.
I want to linger on the type system of Fω for a bit longer. We can think of Fω type expressions as terms in a simply-typed lambda calculus whose base “type” is , the kind of types. This is the core of what gives Fω, and therefore 1ML, its power — type checking can invoke some form of computation at the type level. This power comes with a corresponding burden for the implementer, who must essentially build an interpreter for STLC into their type checker.
The key rule is this one:
In words: “whenever an Fω term is considered to have a type , we could also consider it to have type if is another well-formed type which is equivalent to .”
The type equivalence rules are presented as two-directional equalities (), but in practice, a compiler will want to think of these as one-directional reductions. You can use a technique like normalization by evaluation for this. Bringing types all the way to normal form whenever you construct them is a viable approach, but it gets slow for very large types; there are standard techniques available for being lazier (this is not 1ML-specific).
Semantic types and modeling them in code
Figure 6 defines the grammar for semantic types. It’s important, so I’ll reproduce it here.
Note: the meaning of overline notation, as in
This is meta-theoretic syntax that just means “zero or more”
(sometimes including delimiting commas).
For those familiar with Rust’s macro system,
it is comparable to something like { $($l:label : $sigma:large_typ),* }.
Large and small semantic types
The first thing we should observe is that large and small types aren’t really defining a new grammar, but instead a subgrammar of Fω types (so we have ). This is why the metavariable doesn’t appear much in the rest of the paper from now on — most of the Fω stuff going on will be restricted to semantic types.
The only wrinkle is the new type formers: the singleton and the function type with a purity annotation . The paper defines these via desugaring to Fω types with single-field records, but an implementation will find it easier to treat them as their own type constructors.
Paths
The path grammar is a little wonky. It allows repetition of in two different ways, suggesting that a path is an Fω type variable applied to a list of lists of small types, like . My guess is that this is just a mistake; it suggests that paths have structure beyond the Fω types of which they are supposed to be a subgrammar, and nothing in the rest of the paper goes wrong if we just treat it as .
I honestly find the whole idea of a path a little poorly motivated. My best attempt to articulate the role paths play in the rest of the paper is that they enforce that a semantic type is (roughly) in normal form (since type-level function applications always have a variable at their head), and that only small types can be applied as arguments to functions. “Path” is really a misnomer from this perspective. (The fact that a set of paths is an input to the subtyping relation is actually a red herring, as I’ll discuss in a later article.)
Abstracted types
The abstracted type grammar
risks giving the reader poor intuitions.1
You should think of “abstracted types” as corresponding to signatures (a.k.a. module types)
in traditional descriptions of ML.
They can contain one or more abstract types (the ),
but thinking of them as existentially quantified is too limiting,
since they can be specialized to concrete types after the fact
by constructs like T where type t = bool.
As an illustration of this, you can look at the desugaring rules
on page 25,
which say that the surface signature { type t; x : t } gets desugared
to .
This is an Fω type which is provably useless!2
But this signature isn’t useless — we could readily imagine a functor in OCaml having it as an argument.
In my opinion, writing here is an abuse of notation. A better mental model of an abstracted type is something like — a large type expression that has a number of free variables, and is equipped with a description of those variables (their kinds), but does not prescribe an interpretation for the variables, allowing them to later become existentially quantified, universally quantified, or substituted with concrete types. This is what the paper calls translucency; if you’re familiar with nominal sets, you can also think of it as a binder.
A summary of my implementation recommendations
Don’t create a separate data type for Fω types ; instead, create a data type that represents large types .
Contexts contain large types.
Don’t have a separate data type for small types; it will result in too much code duplication. Instead, smallness should be an invariant not tracked by the type system.
Do create a separate data type for , along the lines of:
type signat = (type_variable * kind) list * large_typeIntroduce a dedicated constructor for in your large type ADT.
Either have a dedicated constructor for , or separate it out based on to model the syntactic invariant for pure functions ( and ).
Wrapping up
That’s all I have time for today. In the next entry, I’ll cover the surface syntax of 1ML, the desugaring judgement , the role of the type, and how I think about phase separation.
Footnotes
1: The use of this notation goes at least back to F-ing modules, so there is at least precedent for it, and it makes the rules more concise to not have to coerce from to a “real” existential type, but I still don’t like it. ↩︎
2: This type is isomorphic to , a type whose values cannot be distinguished in any way — there’s no way to write a function that isn’t constant. ↩︎