Existential types seem like a natural mechanism for ensuring representation independence of programs. With existential types, we get the property that if two programs have the same existential type, no program context can tell them apart. This is a powerful idea. It corresponds nicely to what we might intuitively expect representation independence to mean: clients of a module shouldn't depend on the module's internal representation. If we make it so that clients have no way of finding out or affecting details of the module's implementation, then we get representation independence for free: we can be sure that clients don't depend on module internals, because they can't.
But universal types are another story. They don't seem like such a natural mechanism for representation independence. With existential types, we have the property that no program context can distinguish two expressions with the same existential type; that is, the existentially typed expressions are contextually equivalent. But with universal types, we have the property that no two program contexts can cause the same universally typed expression to behave differently. Representation independence seems to be all about flexibility: Implement your module any way you want! Pick any internal representation type you want! Do it again! As long as all your implementations present the same interface, you're okay! Let a thousand flowers bloom! But a universally typed expression is one that stubbornly does the same thing, that must do the same thing, regardless of what type it's instantiated with. Expressions of universal type can't know or care what type they're instantiated with; they can't decide to behave this way or that way at run-time depending on how they were instantiated. This is the parametricity property.
Yet, we're told in paper after paper that the representation independence we get from existential types is a consequence of the parametricity we get from universal types. How can that be? They seem so different in spirit; how could the former possibly follow from the latter?
Let me show you it!
To solve this mystery, first we have to observe that existential types can
be encoded using universal types. We can encode the existential type exists alpha. tau
as the following universal type (thanks, TaPL 24.3):
forall beta. (forall alpha. tau -> beta) -> beta
That thing's pretty hairy, but let's think about what it means. Suppose we have some
expression g
that has forall beta. (forall alpha. tau -> beta) ->
beta
as its type. Let's start with the forall beta
on the front. It means that g
is waiting to be applied to (or instantiated with) a type -- any type -- and we can't use it until we
do. Suppose we instantiate g
with Nat
, the type of natural numbers, for the time being. We do the
type application, replacing the occurrence of beta
with Nat
, and now
we've got an expression we'll call g'
, which has type
(forall alpha. tau -> Nat) -> Nat
What's g'
? We can see that it has an arrow type, so it must be a
function. In particular, it's a function waiting to be passed an
argument that's itself a function of type forall alpha. tau -> Nat
.
As soon as we give g'
one of those, it will be obliged to give us a
Nat
sooner or later.
Note that it doesn't matter what function of type forall
alpha. tau -> Nat
we give to g'
-- as long as we give it
some function of that type, we'll get a Nat
back.
But furthermore -- and this is important! -- when we give g'
that
function of type forall alpha. tau -> Nat
, it won't be
allowed to simply throw its argument away and give us our Nat
. For
example, g'
cannot just be
lambda f: (forall alpha. tau -> Nat). 5
even though that expression seems to have the right type. Why not? Because, remember, we
picked Nat
arbitrarily. We could have just as easily picked Bool
, and
then we'd need g'
to return something of type Bool
. So having 5
as
the body of g'
isn't going to cut it: we need the body of g'
to be of
whatever type we instantiated beta
with in the first place. What's
the only certain way to get something of that type? By applying f
,
of course!
Oh, wait, but f
's type is forall alpha. tau -> Nat
. Since it starts with forall
, we need to apply it to a type before we can use it. Any type will do. Let's choose the type Squishy
. Now we almost know how to write down g'
:
lambda f: (forall alpha. tau -> int). f Squishy ...
How do we fill in the rest? Well, after instantiating f
with Squishy
,
we get a function that will take a value of the type
tau
-but-with-occurrences-of-alpha
-replaced-with-Squishy
(or tau[alpha
:= Squishy]
, for short), and will give us a Nat
in return. We really
need that Nat
, because we need to be able to return something of type Nat
. So there had better damn well be a value available of type
tau[alpha := Squishy]
. Suppose that we have one handy, and let's call it glork
.
That leaves us with the following expression g'
:
lambda f: (forall alpha. tau -> Nat). f Squishy glork
Note that the expression f Squishy glork
evaluates to an int
-- just what we want.
Our whole expression g
of type forall beta. (forall alpha. tau ->
beta) -> beta
, then, is
Lambda beta. lambda f: (forall alpha. tau -> beta). f Squishy glork
where Squishy
is any type whatsoever and glork
is a value of type
tau[alpha := Squishy]
. In other words, we can manufacture a value of type
forall beta. (forall alpha. tau -> beta) -> beta
, just as long as we
have Squishy
and glork
. How do those requirements stack up to the
requirements needed for manufacturing a value of type exists alpha. tau
?
They're exactly the same. A value of type exists alpha. tau
is a package comprising some type S
, together with a term of type
tau[alpha := S]
. (These components correspond, respectively, to the
hidden, internal representation type of a module and the
implementation of the operations that its interface exposes. What's
shown to the world? Just the signature: alpha
, the name of
the abstract type, and tau
, the types of its operations.)
You need both of those components to get a value of type exists alpha. tau
. So it suddenly makes sense that forall beta. (forall alpha. tau -> beta) -> beta
is our encoding for exists alpha. tau
, because you need
the same two things to create values of the universally encoded type as you need to create values of the existential type.
Okay, so now that we've convinced ourselves that the encoding is
right, let's consider what would happen in the universally encoded
type if parametricity didn't hold -- if we had a way of finding out, at run-time, what types we were instantiating our universally typed expressions with. First of all, let's consider what would happen if we could find out what beta
was at run-time. If we could do that, we might be able to manufacture a value of type beta
, without having to invoke f
. And if we didn't have to invoke f
, then we wouldn't need Squishy
and glork
. So, the fact that we don't and can't know what beta
is -- parametricity! -- ensures that Squishy
and glork
have to exist.
That's half of it. Now consider what would happen if we could find out what alpha
was at run-time. If we could, then we wouldn't need Squishy
, because alpha
would be Squishy
, and knowing that, we might be able to manufacture glork
. But, again, thanks to parametricity, we don't and can't know what alpha
is, so we're once again assured that Squishy
and glork
must exist.
Parametricity makes existential-style data abstraction possible! Hooray for parametricity!