Guest post by Ingo Wechsung.

Advanced Parametric Polymorphism

Every programmer tries to avoid duplicate work. Whenever we identify a common pattern, we try to abstract over it such that we can reuse a generalized solution in many forms (i.e. polymorphically). The two main approaches to polymorphism are object-oriented subtyping and parameterized types.

Parametric polymorphism is established in mainstream programming languages for quite some time, though the term is unfamiliar to many. For example, in Java, it goes under the name of generic types.

The basic idea is simple: in the type declaration of functions and data (or methods and classes), types can be given as type variables as long as their concrete types can be determined later. Such type variables may be introduced at certain places, depending on the language. Java for example requires explicit introduction of type variables. Later, when the function or data is used, those formal type variables are instantiated with concrete types (which could well be other type variables that are valid in the context).

Here is an example in Frege that shows the abstract type declaration for the classical map function, its implementation, and its usage for the concrete type of integer lists and a usage thereof.

Classical Haskell map function

map ∷ (a→b) → [a] → [b]   -- can be implicit
map f []     = []
map f (x:xs) = f x : map f xs

-- instantiation is always invisible
sqrs = map (^2) [1..10]
-- sqrs == [1, 4, 9, 16, ... , 100]

Here is an approximation of how this would look like in Java:

Hypothetical Java map function

// generic types are explicit
public static<A,B> List<B> map(Function<A,B> f, List<A> xxs) {
    // ... details omitted ...
}

// instantiation is visible
static List<Integer> sqrs = <Integer, Integer>map(
	x -> x * x,
	Arrays.asList(1,2,3,4,5,6,7,8,9,10));

Despite Frege type inference being able to infer the type of map entirely on its own, it is customary to provide the type declaration explicitly. (Tools like the eclipse plugin make this a one-click action.)

The point is that in Frege, the introduction of type variables can be implicit, and the instantiation of type variables is always invisible. Nevertheless, it can be deduced that map is instantiated at type (Int→Int) → [Int] → [Int] on the right hand side of the definition of sqrs. But because instantiation is invisible, many Haskell and Frege programmers don’t even know that it exists at all.

In contrast, the Java code shows clearly both the mandatory introduction of type variables in the method declaration as well as their instantiation. The Java compiler can sometimes - but not always - infer those instantiations as well.

This chapter is not about syntactic convenience!

It is about a concept that belongs closely to parametric polymorphism but is less widely known: higher ranked types.

It should provide insight why noble and honourable efforts like FunctionalJava, HighJ, Vavr (formerly JavaSlang), and others are inherently limited in their abstraction capabilities by the language they build upon, because Java doesn’t support higher ranked types, at least not directly.

When generic is not polymorphic enough

Suppose your code contains a repeated use of a function (reverse or drop n) in a pattern like "first apply that function to the first list, then to the second list, and then combine the resulting lists with zip".

A repeated pattern

zip (drop n  somelist) (drop n  otherlist)
zip (reverse somelist) (reverse otherlist)

In the repeated use of this pattern, the types of reverse and drop n look identical.

The type of both reverse and drop n

∀a.[a] → [a]  --(1)

(1) For the sake of clarity, we provide types using explicit quantification (∀a.) in this article. Giving this quantification is optional and usually omitted in real code.

There are even more examples for functions of this type that we might want to use.

Functions of type ∀a.[a] → [a]

reverse
tail
init
drop 10
take 20
id

The intent is clear: we want to combine two lists, but only after having applied the same transformation to both lists. In short, what we have is

Showing the repeated use of f

zip (f somelist) (f otherlist)

for some appropriate function f, and now we want to get rid of the parentheses and the repetition of f (remember that f could be a rather lengthy expression itself).

It looks like we need a function fzip, such that we could write

Abstracting over the pattern with fzip

fzip f somelist otherlist

A possible implementation that comes to mind would be:

First attempt at fzip

fzip f [] [] = []
fzip f xs ys = zip (f xs) (f ys)

Indeed, what could be easier?

But wait! When we ask the REPL for the type of fzip, it tells us this:

The inferred type of fzip

∀a b.([a]->[b]) -> [a] -> [a] -> [(b,b)]

For some reason, it forces our two lists to have the same element type!

But the expressions that we wanted to refactor didn’t have that restriction. After all, the transformation functions we intent to use are generic in the list element type. They do work on any list.

Anyway, as it stands, applying the planned refactoring would result in type errors in all places where the first and the second list are not of the same type. For example we cannot fzip a list of characters with a list of boolean values.

So, what is wrong with our fzip?

To understand the case, one needs to remember what has been said about instantiation above. In an expression like

This will not work

fzip reverse ['a', 'b', 'c'] [false, true]
-- type error in expression [false,true]
--    type is : [Bool]
--    expected: [Char]

at what type should reverse get instantiated? If we choose

[Char] → [Char]

it wont be able to reverse boolean lists. And if we choose

[Bool] → [Bool]

it wont be able to reverse the character list.

In the above example, the compiler chooses to instantiate reverse at type [Char] → [Char] according to the character list argument, and therefore it expects the remaining argument to have the same type. After all, this is what the type of fzip demands! This explains the error message.

But why is this instantiation needed at all? It is needed because of a restriction of type inference in the Hindley-Milner type system, which forms the base of the type systems of languages like ML, Haskell, F# and Frege. This restriction says that lambda bound values (you can read this as "function arguments") are assumed to be monomorphic. And this needs to be so because otherwise type inference would become undecidable.

Ranking Types

Another way to put this is that type inference à la Hindley-Milner (in the following HM for short) can only deal with polymorphism of rank 1. Yet another way to put this is that rank 1 types are exactly those polymorphic types that a HM algorithm can infer. Practically, this means that in languages that obey strictly to HM, higher order functions can only take monomorphic functions or functions that are instantiated at a monomorphic type. To be blunt, our fzip can’t be written in ML or F#!

Higher rank types

A rank 2 type is a function type where a rank 1 type appears as argument, that is, left of the function array. Generally, a type of rank k is a function type that has a type of rank (k-1) in argument position.

Think about this for a moment! There is an infinite number of ranks, and each rank is inhabited by an infinite number of types. Isn’t that great?

Using Higher Ranked Types

Fortunately, while type inference is undecidable for higher ranked types, type checking is not. That is, a computer cannot find a higher ranked type for some expression without further information. But given a type and an expression, it can decide whether the expression can possibly be of this type.

The type checkers of GHC (with language extension RankNTypes) and Frege employ this fact and allow polymorphic functions as arguments.

To make this work, the type of a function that takes polymorphic arguments must be annotated, or at least the polymorphic argument itself must be annotated accordingly, and type inference will do the rest.

When such an annotation is present, instead of looking for the type a functional argument needs to be instantiated at, the type checker just checks if the argument’s type is at least as general as the annotated type.

Hence, the solution to our problem is simply to point out that we want our function argument f to be polymorphic. We can do this by providing the following annotation for fzip:

Making fzip work with higher rank polymorphism

fzip ∷ (∀ a.[a] → [a]) → [x] → [y] → [(x,y)]
--     ---------------                       universally quantified
--                                           polymorphic type of f
fzip f xs ys = zip (f xs) (f ys)

The code for fzip stays the same! But the type now says that f is a function that takes a list and returns a list of the same type, for all possible list element types. In addition, the types of the lists f works on is now completely decoupled from the actual types of the list arguments. But since f works on all list types, it is safe to apply it to both arguments.

The crucial point is the universally quantified polymorphic type for the function argument. When in doubt what to write here, one can simply ask the REPL for the type of such a function.

Getting help from the REPL

:type reverse
[α] -> [α]

To get the quantified type, add the forall keyword (that can also be written ∀, if you like) and list all type variables that occur in the type. If you don’t like the type variable names, you can simply rename them. For example, the following types are indistinguishable for the compiler:

Type declaration variants

forall a.[a] → [a]
forall b.[b] → [b]
∀ quetzalcoatl.[quetzalcoatl] → [quetzalcoatl]

Alternatively, we can write fzip with an in-line type declaration for f only but without the full type declaration for fzip:

Annotating only f

fzip (f ∷ ∀a.[a] → [a]) xs ys = zip (f xs) (f ys)

though I find this much less readable.

We can now use fzip with a variety of functions. But the types of all of them need to be at least as polymorphic as the annotated type for f. For example, we cannot use fzip with a function of the restricted type [Int] → [Int], even if both lists are integer lists.

Here are some examples. I give the type of f as comment:

Working f functions

fzip id         [1..10] ['a'..'z']   -- ∀a. a  →  a
fzip (drop 3)   [1..10] ['a'..'z']   -- ∀a.[a] → [a]
fzip reverse    [1..10] ['a'..'z']   -- ∀a.[a] → [a]
fzip (map id)   [1..10] ['a'..'z']   -- ∀a.[a] → [a]
fzip tail       [1..10] ['a'..'z']   -- ∀a.[a] → [a]
fzip (const []) [1..10] ['a'..'z']   -- ∀a b.a → [b]

This is it. We will come back to this and discuss the shortcomings as well as possible improvements of the higher rank solution.

Meanwhile, here is some homework for the very interested ones:

Why not be even more general and let the type of f be ∀a b.[a] → [b]? (The last example hints at one reason.)
(for true Java experts) Write fzip in Java without using casts or @SuppressWarnings so that it compiles without warnings! (Yes, it is possible.)

Java solution

Marimuthu’s proposal

Advanced Parametric Polymorphism