Here is a straightforward, practice-driven motivation for relational parametricity.
What is relational parametricity?
Relational parametricity is a technique for proving "free theorems". This technique is based on using arbitrary (many-to-many) binary relations instead of ordinary functions (which are equivalent to many-to-one relations).
"Free theorems" are laws that are automatically satisfied by polymorphic functions when the code of those functions is "fully polymorphic". Knowing that these laws hold, the programmer can refactor code safely, as the result values are guaranteed to remain the same.
Code is "fully polymorphic" when all types other than Unit are treated as type parameters, there are no side effects, no external libraries or "built-in" functions, no run-time type identification or reflection. The code must treat all types as "opaque type parameters", that is, as arbitrary types about which nothing is known. The code may not even assume that those types are inhabited.
For instance, any lambda-term from System F is "fully polymorphic code" by this definition. So, any given lambda-term from System F will automatically satisfy a certain law. The form of that law depends only on the type of the term.
If we want to prove that this property is true, the technique of relational parametricity is pretty much the only known technique that is powerful enough and works in all cases. This technique can be generalized to many programming languages other than System F (lambda calculus with linear types, or with dependent types, etc.).
Details
In the practice of functional programming, one often uses polymorphic functions, that is, functions parameterized by a type, for example $\textrm{safeListHead}: \textrm{List}\,a \to \textrm{Maybe}\,a $. We notice that any function $\phi$ of type $\forall a.\,F\, a\to G\,a$, where $F$ and $G$ are covariant functors, will be automatically a natural transformation as long as the code is written in a purely functional, fully parametric manner: no side effects, no runtime type reflection, no external libraries. Such a function $\phi$ will automatically satisfy a naturality law:
$$ \forall (f:a\to b).\,\, \phi \circ \textrm{fmap}_F\, f = \textrm{fmap}_G\, f \circ \phi $$
Most of the examples in Wadler's paper "Theorems for free" are naturality laws of this kind. However, Wadler does not formulate the "free theorem" property in that way and does not point out which of his "free theorems" are naturality laws. The reason is that, for more complicated type signatures, the "free theorem" is not a naturality law but a more complicated law that is not necessarily written as a single equation satisfied by functions.
Now, we ask: how can we prove that any lambda-term of type $\forall a.\,F\, a\to G\,a$ satisfies the appropriate naturality law?
It turns out that the proof is quite hard unless we first generalize the statement we are proving. The proof must go by induction on the structure of the lambda-term. But when we look at some arbitrary code for a function $\phi$, we will usually find sub-expressions that are not themselves natural transformations with type signatures $\forall a.\,F\, a\to G\,a$. Sub-expressions can have arbitrary types, say, of the form $$\forall a.\,F\, a\to G\,a$$ where $F$ and $G$ are type constructors that is not necessarily covariant.
If we want to prove the naturality law by induction on the term, we have to use the inductive assumption that the naturality law already holds for all sub-expressions of that term. But, so far, we cannot formulate a naturality law for sub-expressions that are not themselves natural transformations.
So, we are obliged to ask: what is the generalization of the naturality law for a function of type $\forall a.\,F\, a\to G\,a$ where $F$ and $G$ are arbitrary type constructors (not necessarily covariant or contravariant)?
The naturality law for functors involves lifting an arbitrary function $f: a \to b$ to a covariant functor $F$, which gives us a function $\textrm{fmap}_F\,f$ of type $F \,a \to F \,b $. The type signature of $\textrm{fmap}_F$ is:
$$ \textrm{fmap}_F: (a \to b) \to (F\,a \to F\,b) $$
But we cannot do this lifting when $F$ is an arbitrary type constructor because $\textrm{fmap}_F$ only exists for covariant type constructors $F$.
Now, the key technique in the Reynolds-Wadler approach is to replace an arbitrary function $f: a\to b$ by an arbitrary binary relation $r$ between values of types $a$ and $b$. A binary relation $r$ is in general a many-to-many relation and is not equivalent to any function of type $a\to b$ or $b\to a$. Let us denote the relation types by $r: a \leftrightarrow b$.
Why does it help to replace functions by relations? Because, as it turns out, there is a function $\textrm{rmap}_F$ that lifts a relation $r: a \leftrightarrow b$ to a relation between values of types $F\,a$ and $F\,b$. The type signature of $ \textrm{rmap}_F$ is:
$$ \textrm{rmap}_F: (a \leftrightarrow b) \to (F\, a \leftrightarrow F \,b)$$
The $\textrm{rmap}_F$ is well-defined even if $F$ is neither covariant nor contravariant (unlike $\textrm{fmap}_F$ that exists only when $F$ is covariant). The function $\textrm{rmap}_F$ can be constructed for any $F$ that is built up from fully parametric type constructions (such as, in System F, the unit type, free type parameters, products, co-products, function types, recursive types, universally quantified types).
Now, the lifted relation $\textrm{rmap}(r)$ can be used to formulate the "relational naturality law":
$$ \forall (r: a\leftrightarrow b).\,\,\textrm{ if } (x,y)\in \textrm{rmap}_F\,r\textrm{ then } (\phi(x),\phi(y))\in \textrm{rmap}_G\,r $$
When $F$ and $G$ are covariant and when we choose the relation $r$ to be a function graph then the "rmap" operation reduces to "fmap". So, the relational naturality law reproduces the usual law of natural transformations.
But now we have a law that applies to any type signature, not necessarily restricted to covariant or contravariant type constructors. At this point, one can prove by straightforward induction that any lambda-term satisfies the relational naturality law.
The main technical difficulty in this approach is the definition of the operation $\textrm{rmap}_F$ for an arbitrary type constructor $F$. This definition must proceed by induction on the type structure of $F$. The paper by Wadler does not contain an adequately detailed explanation of how $\textrm{rmap}_F$ must be defined; neither does the paper by Reynolds. Those papers hide the details and may create an impression that $\textrm{rmap}_F$ is somehow already defined, or that it is obvious how it should be defined. But actually it is neither obvious nor easy to define $\textrm{rmap}_F$.
In my understanding, when we want to define the core construction of relational parametricity (mapping a type to a binary relation) for arbitrary type constructors, an explicit use of $\textrm{rmap}_F$ is required. But neither Reynolds nor Wadler talk about arbitrary type constructors. Wadler only gives some examples with List but does not explain what to do with other type constructors.
The definition of $\textrm{rmap}_F$ is especially nontrivial when $F$ is a recursively defined type constructor, or when $F\,x$ contains universally quantified types inside, for example: $$F\,x = \forall a. ((a\to x)\to a)\to a$$
After looking through many papers and tutorials and blog posts on relational parametricity, I did not find any sources that work out the details of "rmap" or explain why it is required in formulating the parametricity theorems.
Here is an example of a property one can derive via the parametricity theorem:
Given any (covariant) polynomial functor $L$, any parametrically polymorphic function $\phi$ with type signature:
$$\phi: \forall a.\, L (a \to a) \to a \to a$$ satisfies the following law: for all $p: L(a \to a)$, we have: $$ \phi\, p = \phi (\textrm{fmap}_L\, j\, p)\, \textrm{id} $$ where the function $j: (a \to a) \to (a \to a) \to (a \to a)$ is the function composition: $j\, h\, k = h\, \circ\, k$
To prove this property, one needs to be able to handle an arbitrary polynomial type constructor $L$. The papers of Reynolds and Wadler, or any other papers I have seen, do not demonstrate adequate techniques for proving properties of this kind.
Summary
The main motivation for considering relational parametricity is to obtain a powerful technique for deriving properties and laws for functions whose types involve universally quantified types. In particular, one can prove the naturality laws and other such laws that apply to fully polymorphic code with type parameters.
Without relational parametricity, the proof of naturality laws becomes extremely complicated and cannot be easily generalized to different kinds of lambda calculus.
The main reason relational parametricity works is that one can define a function $\textrm{rmap}_F$ transforming a relation of type $a \leftrightarrow b$ into a relation of type $(F\, a \leftrightarrow F \,b)$. The transformation $\textrm{rmap}_F$ can be defined for an arbitrarily complicated type constructor $F$, not necessarily covariant or contravariant. Using $\textrm{rmap}_F$, one can define a general "relational naturality law" for terms of arbitrarily complicated types. Then it becomes straightforward to prove that any fully parametric code will automatically satisfy that law.
The price paid for this power is that the result of the relational parametricity theorem is a law for relations, not for functions. A programmer needs somehow to derive an equation for functions from the relational law. This additional step is not trivial; practitioners need to build intuition and learn specific techniques for doing that in various cases.
There are currently no tutorials or books that explain this material in detail.
\X:Type. \a:X. if X = {(0,0), (1,0), (0,1), (1,1)} then 0 else a– Jules Dec 03 '14 at 18:27