That language is an instrument of human reason, and not merely a medium for the expression of thought, is a truth generally admitted.
— George Boole
Fundamental to all functional languages is the most atomic notion of composition, function abstraction of a single variable. The lambda calculus consists very simply of three terms and all valid recursive combinations thereof:
This compact notation looks slightly different from what you're used to in Haskell
but it's actually not: \x -> x a
. This means
what you see in the picture above would translate to (\x -> x) (\y -> y)
, which
would be equivalent to writing id id
(which of course evaluates to id
).
The three terms are typically referred to in code by several contractions of their names:
- Var - A variable
- Lam - A lambda abstraction
- App - An application
A lambda term is said to bind its variable. For example the lambda here binds
Using the lambda calculus notation we write:
In other words,
The lambda calculus is often called the "assembly language" of functional programming, and variations and extensions on it form the basis of many functional compiler intermediate forms for languages like Haskell, OCaml, Standard ML, etc. The variation we will discuss first is known as untyped lambda calculus, by contrast later we will discuss the typed lambda calculus which is an extension thereof.
There are several syntactical conventions that we will adopt when writing lambda expressions. Application of multiple expressions associates to the left.
By convention application extends as far to the right as is syntactically meaningful. Parentheses are used to disambiguate.
In the lambda calculus, each lambda abstraction binds a single variable, and the lambda abstraction's body may be another lambda abstraction. Out of convenience we often write multiple lambda abstractions with their variables on one lambda symbol. This is merely a syntactical convention and does not change the underlying meaning.
The actual implementation of the lambda calculus admits several degrees of freedom in how lambda abstractions are represented. The most notable is the choice of identifiers for the binding variables. A variable is said to be bound if it is contained in a lambda expression of the same variable binding. Conversely a variable is free if it is not bound.
A term with free variables is said to be an open term while one without free variables is said to be closed or a combinator.
Multiple lambda abstractions may bind the same variable name.
Each occurrence of a variable is then bound by the nearest enclosing binder.
For example, the
There are three fundamental closed expressions called the SKI combinators.
In Haskell these are written simply:
s f g x = f x (g x)
k x y = x
i x = x
Rather remarkably Moses Schönfinkel showed that all closed lambda expressions can be expressed in terms of only the S and K combinators - even the I combinator. For example one can easily show that SKK reduces to I.
This fact is a useful sanity check when testing an implementation of the lambda calculus.
The simplest implementation of the lambda calculus syntax with named binders is the following definition.
type Name = String
data Expr
= Var Name
| App Expr Expr
| Lam Name Expr
There are several lexical syntax choices for lambda expressions, we will simply
choose the Haskell convention which denotes lambda by the backslash (\
) to
the body with (->
), and application by spaces. Named variables are simply
alphanumeric sequences of characters.
-
Logical notation:
$\mathtt{const} = \lambda x y . x$ -
Haskell notation:
const = \x y -> x
In addition other terms like literal numbers or booleans can be added, and these
make writing expository examples a little easier. For these we will add a
Lit
constructor.
data Expr
= ...
| Lit Lit
data Lit
= LInt Int
| LBool Bool
Evaluation of a lambda term (
A substitution metavariable will be written as
In detail, substitution is defined like this:
where
The fundamental issue with using locally named binders is the problem of name
capture, or how to handle the case where a substitution conflicts with the
names of free variables. We need the condition in the last case to avoid the
naive substitution that would fundamentally alter the meaning of the following
expression when
By convention we will always use a capture-avoiding substitution. Substitution will only proceed if the variable is not in the set of free variables of the expression, and if it does then a fresh variable will be created in its place.
There are several binding libraries and alternative implementations of the lambda calculus syntax that avoid these problems. It is a very common problem and it is very easy to implement incorrectly even for experts.
Alpha equivalence
Alpha equivalence is the property ( when using named binders ) that changing the variable on the binder and throughout the body of the expression should not change the fundamental meaning of the whole expression. So for example the following are alpha-equivalent.
Beta-reduction
Beta reduction is simply a single substitution step, replacing a variable bound by a lambda expression with the argument to the lambda throughout the body of the expression.
Eta-reduction
This is justified by the fact that if we apply both sides to a term, one step of beta reduction turns the left side to the right side:
Eta-expansion
The opposite of eta reduction is eta-expansion, which takes a function that is not saturated and makes all variables explicitly bound in a lambda. Eta-expansion will be important when we discuss translation into STG.
Evaluation of lambda calculus expressions proceeds by beta reduction. The variables bound in a lambda are substituted across the body of the lambda. There are several degrees of freedom in the design space about how to do this, and in which order an expression should be evaluated. For instance we could evaluate under the lambda and then substitute variables into it, or instead evaluate the arguments and then substitute and then reduce the lambda expressions. More on this will be discussed in the section on Evaluation models.
Untyped> (\x.x) 1
1
Untyped> (\x y . y) 1 2
2
Untyped> (\x y z. x z (y z)) (\x y . x) (\x y . x)
=> \x y z . (x z (y z))
=> \y z . ((\x y . x) z (y z))
=> \x y . x
=> \y . z
=> z
=> \z . z
\z . z
Note that the last evaluation was SKK which we encountered earlier.
In the untyped lambda calculus we can freely represent infinitely diverging expressions:
Untyped> \f . (f (\x . (f x x)) (\x . (f x x)))
\f . (f (\x . (f x x)) (\x . (f x x)))
Untyped> (\f . (\x. (f x x)) (\x. (f x x))) (\f x . f f)
...
Untyped> (\x. x x) (\x. x x)
...
In addition to application, a construct known as a let binding is often added to the lambda calculus syntax. In the untyped lambda calculus, let bindings are semantically equivalent to applied lambda expressions.
In our languages we will write let statements like they appear in Haskell.
let a = e in b
Toplevel expressions will be written as let
statements without a body to
indicate that they are added to the global scope. The Haskell language does not
use this convention but OCaml, StandardML use this convention. In Haskell the
preceding let is simply omitted for toplevel declarations.
let S f g x = f x (g x);
let K x y = x;
let I x = x;
let skk = S K K;
For now the evaluation rule for let is identical to that of an applied lambda.
In later variations of the lambda calculus let expressions will have different semantics and will differ from applied lambda expressions. More on this will be discussed in the section on Hindley-Milner inference.
-
0
-
1
-
2
-
succ
-
pred
-
not
-
and
-
or
-
add
-
mul
Probably the most famous combinator is Curry's Y combinator. Within an untyped lambda calculus, Y can be used to allow an expression to contain a reference to itself and reduce on itself permitting recursion and looping logic.
The
For example the factorial function can be defined recursively in terms of
repeated applications of itself to fixpoint until the base case of
For fun one can prove that the Y-combinator can be expressed in terms of the S and K combinators.
In an untyped lambda calculus language without explicit fixpoint or recursive let bindings, the Y combinator can be used to create both of these constructs out of nothing but lambda expressions. However it is more common to just add either an atomic fixpoint operator or a recursive let as a fundamental construct in the term syntax.
Where
Together with the fixpoint (or the Y combinator) we can create let bindings
which contain a reference to itself within the body of the bound expression.
We'll call these recursive let bindings, they are written as let rec
in ML
dialects. For now we will implement recursive lets as simply syntactic sugar
for wrapping a fixpoint around a lambda binding by the following equivalence.
let rec x = e1 in e2 = let x = fix (\x. e1) in e2
So for example we can now write down every functional programmer's favorite two
functions: factorial
and fibonacci
. To show both styles, one is written
with let rec
and the other with explicit fix
.
let fact = fix (\fact -> \n ->
if (n == 0)
then 1
else (n * (fact (n-1))));
let rec fib n =
if (n == 0)
then 0
else if (n==1)
then 1
else ((fib (n-1)) + (fib (n-2)));
Omega Combinator
An important degenerate case that we'll test is the omega combinator which applies a single argument to itself.
When we apply the
We'll call this expression the
Hackage provides quite a few pretty printer libraries that ease the process of
dumping out textual forms for our data types. Although there are some
differences between the libraries most of them use the same set of combinators.
We will use the Text.PrettyPrint
module from the
pretty package on Hackage.
Most of our pretty printing will be unavoidable boilerplate but will make
debugging internal state much easier.
Combinators
<>
Concatenation
<+>
Spaced concatenation
char
Renders a character as a Doc
text
Renders a string as a Doc
hsep
Horizontally concatenates a list of Doc
vcat
Vertically joins a list of Doc
with newlines
The core type of the pretty printer is the Doc
type which is the abstract
type of documents. Combinators over this type will manipulate the internal
structure of this document which is then finally reified to a physical string
using the render
function. Since we intend to pretty print across multiple
types we will create a Pretty
typeclass.
module Pretty where
import Text.PrettyPrint
class Pretty p where
ppr :: Int -> p -> Doc
pp :: p -> Doc
pp = ppr 0
First, we create two helper functions that collapse our lambda bindings so we can print them out as single lambda expressions.
viewVars :: Expr -> [Name]
viewVars (Lam n a) = n : viewVars a
viewVars _ = []
viewBody :: Expr -> Expr
viewBody (Lam _ a) = viewBody a
viewBody x = x
Then we create a helper function for parenthesizing subexpressions.
parensIf :: Bool -> Doc -> Doc
parensIf True = parens
parensIf False = id
Finally, we define ppr
. The p
variable will indicate our depth
within the current structure we're printing and allow us to print out
differently to disambiguate it from its surroundings if necessary.
instance Pretty Expr where
ppr p e = case e of
Lit (LInt a) -> text (show a)
Lit (LBool b) -> text (show b)
Var x -> text x
App a b -> parensIf (p>0) $ (ppr (p+1) a) <+> (ppr p b)
Lam x a -> parensIf (p>0) $
char '\\'
<> hsep (fmap pp (viewVars e))
<+> "->"
<+> ppr (p+1) (viewBody e)
ppexpr :: Expr -> String
ppexpr = render . ppr 0
\pagebreak