Annihilating Hilbert's 2nd Problem

TW: this post contains Haskell slander. Or, rather, slander of Haskellers. As has been established,¹ Haskell is divinely inspired and correct, but man is involved,

Who of us would not be glad to lift the veil behind which the futures lies hidden; to cast a glance at the next advances of our science and at the secrets of its development during future centuries?

– David Hilbert

In a previous post, I said that psycho diagrams are usually indicative of something I'm going to be mega interested in: "It's like glimpsing arcana, or an overly complicated diagram of an extinct theology, a crackhead's projection of metaphysical laws scribbled in soot," and this post on the is no exception, behold:

History

In 1902, David Hilbert presented a set of twenty three problems to the Bulletin of the American Mathematical Society,² the second of which questioned the existence of a procedure to prove that there is a finite number of logical steps based upon mathematical axioms that can never lead to contradictory results.

In the coming decades, Alan Turing, Kurt Gödel, and Alonzo Church would all independently answer this question (resoundingly, no).

Turing with his invention of the eponymous universal computation machine which gave rise to imperative programming and all of computer science. Gödel with his working pioneering general recursive functions and specifically his second incompleteness theorem which directly addressed Hilbert's second question. And Church who is credited with the specification of the λ-calculus.

How it works

In the λ-calculus, unlike in imperative/Turing/von Neumann programming, there is no distinction between data and functions. Fundamentally, it is just a notation for substitution which gives rise to ~all of computing, for the same reason that Turing Machines are considered universal computation machines.

λ-calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation that can be used to simulate any single-taped Turing machine and was first introduced by mathematician Alonzo Church in the 1930s as part of his research of the foundations of mathematics.

λ-calculus is composed of three objects:

Terms: $x, y, (x \; y), (\lambda x . ( x \; x))$ – variables, expressions, abstractions, and applications are all considered "terms"
Abstractions: $(\lambda x. \_)$ these are definitions of lambda functions
Applications: $(f \; x), (\_ \; \_)$ these are the usages of the abstractions with other terms.

A term can be anything – arbitrary variables, function declarations, or invocations are all considered terms. In other words, if $y$ is some arbitrary term, then $(\lambda x.y)$ is also a term.

The funky looking thing sharing the parentheses with a $\lambda$ is an abstraction which has the general form:

(\lambda \_. \_)

where the first gap holds the variable said to be "bound" by that lambda, and the space on the right of the period delimiter is the body of this function which can contain any other term.

Function application is the process of combining or, y'know, applying terms. All applications are binary, taking two terms. The first term is the function, and the 2nd term is the argument, so the conventional Euclidian function notation $f(x)$ would be expressed $(f \; x)$ .

Any combination of these three building blocks results in a valid lambda expression.

Currying and Left Associativity

It's worth noting right away that λ-calculus is Left Associative, meaning:

x \; y \; z \equiv ((x \ y) \;z) \not\equiv (x \; (y \; z))

Because of Left Associativity, we can do away with many of the parentheses and, though all proper lambda expressions are functions of a single variable, we can introduce some syntactic sugar called currying to allow us define multi-variate functions like so:

\begin{aligned} (\lambda x. (\lambda y.x)) &\equiv \lambda x.\lambda y.x \\ &\equiv \lambda xy.x \\ \end{aligned}

If we look closely, we can see that our function of two arguments returns a function which returns a single argument. In general, an $n$ -argument function spits out a function of $n-1$ arguments.

Evaluation, α-substitution and β-reduction

In the λ-calculus, we use two forms of substitution in order to evaluate our expressions. Recall that λ-calculus is, at its core, just a framework for rules of substitution of some symbols for other symbols.

α-substitution

The first and most rudimentary substitution is aptly named α-substitution which just lets use rewrite any expression comprised of arbitrary symbols with other (unbound) symbols:

\begin{aligned} \lambda x.x &\rightarrow_\alpha \lambda y.y \\ \\ \lambda xy.xxy &\rightarrow_\alpha \lambda xb.xxb \\ &\rightarrow_\alpha \lambda ab.aab \end{aligned}

α-substitution also yields a notion of α-equivalence, which indicates that two expressions are structurally identical, e.g.

\begin{aligned} \lambda x.x &=_\alpha \lambda y.y \\ \\ \lambda xy.xxy &=_\alpha \lambda ab.aab \end{aligned}

We can further illustrate what is meant by a "bound" variable via:

(\lambda x.x)x \rightarrow_\alpha (\lambda y.y)x

In this example, the implied α-substitution on $x$ is only applied to the bound variable $x$ , not the perhaps-poorly-named term $x$ that is being passed as the argument to the abstraction.

β-reduction

β-reduction is a strategy for combining application with abstraction, and is the process by which we actually "evaluate" lambda expressions. Consider an expression $(\lambda x.x \;y)\;z$ . Note that $x$ is the only variable, $y$ and $z$ are free terms. We can assume that function body of the abstraction $y$ does not contain any nested lambdas which rebind $x$ . If they did (unintentionally), we could first apply α-substitution to disambiguate. $z$ is the argument that will be substituted for the variable $x$ in the body of the abstraction $(x \; y)$ . We don't know what $z$ is, since it's not bound by this abstraction, but it's implied that some term called $z$ exists.

To apply the expression, we use the application template $(\_ \; \_)$ . The result of this expression is the result of its body $y$ but with every instance of the bound variable $x$ replaced by the argument to the function which is whatever term immediately follows the abstraction. These steps for substitution define the aforementioned application process.

In our example, $z$ would be the argument to the function defined by the $\lambda x.x\;y$ abstraction. To apply, we just swap every instance of $x$ in the body of our abstraction with our argument $z$ , and discard the scaffolding of the abstraction:

(\lambda \color{red}x\color{black}.\color{red}x\color{black}\; y)z \rightarrow_\beta z \; y

A slightly more complicated example:

\begin{aligned} \big(\lambda \color{red}x\color{black}.\color{red}x\color{black} \; (y \; \color{red}x\color{black})\big)(ff) \rightarrow_\beta (f\;f) \big(y \; (f\;f)\big) \end{aligned}

Though this might seem confusing at first, it's just a pre-fix (🇵🇱) way to express the same concept of functions in standard algebra. E.g.

\begin{aligned} f(x) = 3x^2 + 2x + 1 \\ f(5) = 3(5)^2 + 2(5) + 1 &\implies \Big(\big(f(x) = 3x^2 + 2x + 1\big) \big(f(5)\big)\Big) \\ &\implies \Big(\big(x \rightarrow 3x^2 + 2x + 1\big) 5\Big) \\ &\implies \Big(\big(\lambda x . 3x^2 + 2x + 1\big) 5\Big) \\ 3 \times (5 \times 5) + (2 \times 5) + 1 &\implies \big(3(5)^2 + 2(5) + 1\big) \\ 3 \times 25 + 10 + 1 &\implies 86 \\ \end{aligned}

Though we've yet to define addition, multiplication, exponentiation, or even numbers yet, this example should hopefully add some clarity to function application via β-reduction.

A more complicated example which also employs currying:

\begin{aligned} \Big( \big( (\lambda \color{red}x\color{black}. (\lambda \color{blue}y\color{black}. 2\color{red}x\color{black} + 3\color{blue}y\color{black}) )\;\color{red}5\color{black} \big)\;\color{blue}3\color{black} \Big) &\rightarrow_\beta \Big( (\lambda \color{blue}y\color{black}. 2(\color{red}5\color{black}) + 3\color{blue}y\color{black}) \; \color{blue}3\color{black} \Big) \\ &\rightarrow_\beta 2(\color{red}5\color{black}) + 3(\color{blue}3\color{black}) \end{aligned}

Here I've added a lot of training wheel parens which quickly becomes pretty noisy but this is helpful lest we accidentally incorrectly beta reduce like so:

(\lambda xy.x\;x\;y)(\lambda z.z)a \not\rightarrow_\beta (\lambda xy.x\;x\;y)a

Boolean Arithmetic

With just these three components of terms, abstractions, and application, we can develop boolean arithmetic from which we can also define notions of higher order operations.

We can define $True$ as a function of two arguments which which returns the first argument:

True = \top = \big(\lambda x.(\lambda y.x)\big) = \lambda xy.x

which can be applied like so:

\begin{aligned} (\top \; a \; b) &= \Big(\big(\lambda x.(\lambda y.x)\big) a \; b\Big) \\ &= \Big(\big(\lambda \color{red}x\color{black}y.\color{red}x\color{black}\big) \color{red}a\color{black} \; b\Big) \\ &= a \end{aligned}

Similarly, we can define false as the binary function which selects its second argument:

False = \bot = \big(\lambda x.(\lambda y.y)\big) = \lambda xy.y

e.g.

\begin{aligned} (\bot \; a \; b) &= \Big(\big(\lambda x.(\lambda y.y)\big) a \; b\Big) \\ &= \Big(\big(\lambda x\color{blue}y\color{black}.\color{blue}y\color{black}\big) a \; \color{blue}b\color{black}\Big) \\ &= b \end{aligned}

In this way, True and False are just tuple selectors. We can construct a ternary operator like so:

\mathtt {If\_Then\_Else} = (p \; t \; f)

where $p$ is our predicate condition, $t$ is the case to apply if $p \equiv \top$ and $f$ the case if $p \equiv \bot$ .

Boolean arithmetic also motivates negation $\lnot$ , a unary³ operator which just maps $True$ to $False$ and vice versa. We can define it like so:

\begin{aligned} \lnot &= \big(\lambda x.(x \; \bot \top)\big) \\ &\rightarrow_\alpha \big(\lambda z.(z \; \bot \top)\big) \\ &= \Big(\lambda z.\big(z \; (\lambda xy.y) \; (\lambda xy.x)\big)\Big) \\ \\ \implies (\lnot \; False) &= \Big(\lambda z.\big(z \; (\lambda xy.y) \; (\lambda xy.x)\big)\; False \Big) \\ &\rightarrow_\beta \big(False \; (\lambda xy.y) \; (\lambda xy.x)\big) \\ &\rightarrow_\alpha \big(False \; False \; True\big) \\ &\rightarrow_\alpha \big((\lambda x\color{blue}y\color{black}.\color{blue}y\color{black}) \; False \; \color{blue}True\color{black}\big) \\ &= True \end{aligned}

We can also define logical conjunction and disjunction:

And = \land = \big(\lambda xy. (x \; y \; False)\big)

Here, $x$ is acting as a selector. If $x$ is false, then our whole $And$ abstraction needs to be false. Recall that $False$ selects the 2nd argument, and our two arguments to $x$ are $y \; False$ . If $x$ is true, then the whole abstraction needs to resolve to $True$ iff $y$ is also true. This works since $x = True$ will select the first value from $y \; False$ , and if $y$ is true, then the whole abstraction resolves to true, otherwise it will resolve to false.

Similarly, we can derive logical disjunction via:

Or = \lor = \big(\lambda xy. (x \; y \; True)\big)

Church Numerals

Till now, we haven't even solidified the notion of quantity in the λ-calculus, despite inclusion of arithmetic symbols in a few of the examples earlier. Like any extension from boolean logic to number theory, we'll arbitrarily assign the value of $0$ to one of our boolean values.

0 = \lambda fx.x

The astute reader will note that this is equivalent to our definition of $False$ which is contrary to most von Neumann conventions where $0 = False$ and $1 = True$ . This encoding is entirely arbitrary, but standard in this context where we'll use Church encodings. Note, however, that even within the λ-calculus, there are myriad numerical encodings to choose from which have their own uses including: Scott,⁴ Parigot,⁵ and Stumpfu.⁶

Since all data are functions in the λ-calculus, we can define quantity as iterated calls to that function. Sadly, λ-calculus is 1-indexed. If we have one of something, we treat that as zero:

0 = x

Iterating that value once with some function $f$ , we get $1$ :

\begin{aligned} 0 &= x &= \lambda f x.x \\ 1 &= (f\;x) &= \lambda fx.(f\;x) \\ 2 &= \big(f\; (f\;x)\big) &= \lambda fx.f(f\;x) \\ \vdots \\ n &= (f^n \; x) &=\lambda fx. (f^n \;x) \end{aligned}

Disregarding the notational abuse of $f^n$ to represent $n$ invocations of $f$ , we can see that numbers are just functions which take a function and a variable and then iterate the function on that variable.

Arithmetic

With numbers in hand, we can define basic arithmetic.

Succ

The first building block of which is the successor operation which lets us apply some function $f$ to an argument $x$ , $n+1$ times:

succ = \lambda n. (\lambda fx.\_)

to apply $f$ to $x$ , $n$ times we just apply $(n \; f \; x)$ , and to achieve the $n+1$ th application, we just apply $f$ once more:

succ = \lambda nfx.\big(f\;(n\;f\;x)\big)

We can see that the successor of $2$ is in fact $3$ . We begin with a crucial α-substitution, and proceed with application via β-reduction:

\begin{aligned} 2 &\rightarrow_\alpha ondeez\\ \\ (succ \; ondeez) &= \Big(\lambda \color{green}n\color{black}. \big(\lambda fx.f(\color{green}n\color{black}\;f\;x)\big) \; \color{green}2\color{black}\Big) \\ &= \big(\lambda fx.f(\color{green}2\color{black}\;f\;x)\big) \\ &= \big(\lambda fx.f\big(\color{green}\lambda fx.f(f\;x)\color{black}\;f\;x\big)\big) \\ &= \Big(\lambda fx.f\big(\color{green}f(f \; x)\color{black}\big)\Big) \\ &= \lambda fx.f\big(f(f \; x)\big) \end{aligned}

which results in three applications of $f$ to $x$ which is perhaps the purest interpretation of what it means to be "three."

plus, minus

We can define addition as iterated application of $succ$ . For example, we can find $m + 1$ for some arbitrary $m$ via:

m + 1 = succ \;m

so for $m + n$ (arbitrary addition) we just iterate the $succ$ function:

+ = \lambda mn.(m \; succ \; n)

which is colloquially pronounced "I'm succkin'."

Similarly, we can define subtraction in terms of $pred$ function given by:

pred = \lambda nfx.n \; \big(\lambda gh.h \; (g \; f)\big) (\lambda u.x) (\lambda u.u)

from which we can implement subtraction intuitively as:

- = \lambda mn.(n \; pred \; m)

times

Multiplication can be achieved via iterated addition:

\times = \lambda mn.\big( m \; (+ \; n) \; 0\big)

e.g.

\begin{align} &1 \times 2 \rightarrow_\lambda (\times \; 1 \; 2) \notag \\ &= \Big(\big(\lambda mn.\big( m \; (+ \; n) \; 0\big)\big) \; 1 \; 2\Big) \notag \\ \tag*{$\beta$-reduce $m=1, n=2$} &= 1 \; (+ \; 2) \;0 \\ \tag*{expand $1$} &= (\lambda fx.f\; x) \; (+ \; 2) \;0 \\ \tag*{$\beta$-reduce $f=(+ \; 2)$ } &= \big(\lambda fx.(+ \; 2)\; x\big) \;0 \\ \tag*{$\beta$-reduce $x=0$} &= + \; 2 \; 0 \\ \tag*{expand $+$} &= \big(\lambda mn.(m\; succ \; n)\big) \; 2 \; 0 \\ \tag*{$\beta$-reduce $m=2, n=0$} &= 2 \; succ \; 0 \\ \tag*{expand $2$} &= \big(\lambda fx.f(f\;x)\big) \; succ \; 0 \\ \tag*{$\beta$-reduce $f=succ$} &= \big(\lambda fx.succ\; (succ\;x)\big) \; 0 \\ \tag*{$\beta$-reduce $x=0$} &= succ \; ( succ \; 0) \\ \tag*{expand $succ$} &= \big(\lambda nfx.f(n\;f\;x)\big) \; ( succ \; 0)\\ \tag*{$\beta$-reducde $n=succ, f=0$} &= \lambda fx.f\big(succ \; 0 \;f\;x\big)\\ \tag*{expand $succ$} &= \lambda fx.f\Big(\big(\lambda nfx.f(n\;f\;x)\big) \; 0\;f\;x\Big)\\ \tag*{$\beta$-reduce $n=0$} &= \lambda fx.f\big(\lambda fx.f(0 \;f\;x) \; f \; x\big)\\ \tag*{$\beta$-reduce $f=f$} &= \lambda fx.f\big(\lambda x.f(0 \;f\;x) \; x\big)\\ \tag*{$\beta$-reduce $x=x$} &= \lambda fx.f\big(f(0 \;f\;x)\big)\\ \tag*{expand $0$} &= \lambda fx.f\Big(f\big( (\lambda fx. x) \;f\;x\big)\Big)\\ \tag*{$\beta$-reduce $f=f$} &= \lambda fx.f\Big(f\big( (\lambda x. x) \;x\big)\Big)\\ \tag*{$\beta$-reduce $x=x$} &= \lambda fx.f\big(f \; x\big)\notag\\ &= 2 \end{align}

Exponentiation

Exponentiation is actually simpler than multiplication since we're just applying a number $b$ to itself $e$ times:

pow = \lambda be.e \; b

(in)equality and negative numbers

And finally, we can define some other useful relations between terms. First we define a "helper" function to determine if a quantity (such as the difference between two numbers) is zero:

\begin{aligned} \mathtt {is\_zero} &= \lambda n.n \; (\lambda x.\bot) \; \top \\ \\ \implies (\mathtt {is\_zero} \; 0) &= 0 \; (\lambda x.\bot) \; \top \\ &= (\lambda fx.x) \; (\lambda x.\bot) \; \top \\ &= (\lambda x.x) \; \top \\ &= \top \\ \\ \implies (\mathtt {is\_zero} \; 1) &= 1 \; (\lambda x.\bot) \; \top \\ &= (\lambda fx.f\; x) \; (\lambda x.\bot) \; \top \\ &= \big(\lambda x.(\lambda x.\bot) \; x \big) \; \top \\ &= (\lambda x.\bot) \; \top \\ &= \bot \end{aligned}

Here we can see why we might want to perform some intermediary α-substitutions, but this expression is simple enough/doesn't involve so many nested lambdas referencing the same variable that we can't keep the scope of the referenced value unambiguous. Nevertheless, armed with $\mathtt {is\_zero}$ we'll define less-than-or-equal-to as:

\leq = \lambda mn. \mathtt {is\_zero} \; (- \; m\; n)

Here, if not sooner, we might wonder "what about negative numbers?" The Church Encoding of the natural numbers thus far has not allowed for negative numbers, so –as presently defined– $pred \; 0$ is actually still just $0$ . In order to model the slew of positive and negative integers, we need to introduce a layer of abstraction similar to how we represent complex numbers as $a + bi$ for reasons that will become clear momentarily. First, we define a $pair$ as:

\lambda xyf.f \; x \; y

where $x, y$ are our tuple, and $f$ is the selector allowing us to index either the $fst$ or $snd$ element, where $fst$ , $snd$ are aliases for $\top$ and $\bot$ . Next, we redefine any quantity we might want to model which could be negative as $k = a - b$ where $k \in \mathbb Z$ and $a, b \in \mathbb N$ . A negative number, then is just the pair $k$ with the order of its elements swapped:

neg = \lambda k.(snd \; k) \; (fst \; k)

E.g. we might represent $1 \in \mathbb Z$ as the pair $(2, 1) \in \mathbb N$ s.t. $1 = 2 - 1$ . Thus, $-1 \in \mathbb Z$ is just $-1 = 1 - 2$ . Similarly, we can model the complex numbers (after we've modeled the reals (after we've modeled the rationals)) as pairs where:

\begin{aligned} \mathbb Q &= k/(1 + a), \quad k\in \mathbb Z, a \in \mathbb N. \\ \\ \mathbb R &= f(x) \quad s.t. f(k) \in \mathbb Q \quad s.t.|x - f(k) | < 2^{-k}, k \in \mathbb N \end{aligned}

Encoding reals is a lot of work and you do not want to actually do it in the λ-calculus. But see for example the etc/haskell subdirectory of Marshall for a simple implementation of reals in pure Haskell. This could in principle be translated to the pure λ-calculus.⁷

Back to equality, we can just check that $m \leq n$ and $n \leq m$ :

eq = \lambda mn.\land \; (\leq m \; n) \; (\leq n \; m)

All together, here's a tabular reference of the operations we've defined thus far:

term	abstraction
$\top$	$λxy.x$
$\bot$	$λxy.y$
$\land$	$λpq.p\;q\;p$
$\lor$	$λpq.p\;p\;q$
$\lnot$	$λp.p\;⊥\;⊤$
$0$	$λfx.x$
$1$	$λf.λx.(f\;x)$
$n$	$λfx.(f^n \;x)$
$succ$	$λnfx.f\;(n\;f\;x)$
$+$	$λmn.m\;succ\;n$
$\times$	$λmn.m\;(+ n)\;0$
$pow$	$λbe.e\;b$
$pred$	$λnfx.n\;(λgh.h\;(g\;f))\;(λu.x)\;(λu.u)$
$-$	$λmn.n\;pred\;m$
$\mathtt{is\_zero}$	$λn.n\;(λx.⊥\;⊤$
$\leq$	$λmn.\mathtt{is\_zero}\;(- m\;n)$
$eq$	$λmn.∧\;(≤ m\;n)\;(≤ n\;m)$
$pair$	$λxyf.f\;x\;y$
$fst$	$λp.p\;⊤$
$snd$	$λp.p\;⊥$

Recursion and Combinators

What about recursion functions like the fibonacci sequence, or even just factorial? For the latter, we might try to construct a definition like so:

fac = \lambda n.\Big(\mathtt{is\_zero} \; \overbrace{1}^{\text{base case if n = 0}} \; \underbrace{\big(\times \; n \; (fac \;( - \; n\; 1)\big)}_{\text{recursive case }} \Big)

but right away we run into a problem. Within the body of this function, we don't know what $fac$ is, and since lambda expressions must be self-contained, we cannot have a recursive reference to ourself. Here, we're using more syntactic sugar to assign this whole expression to the symbol $fac$ , but even this isn't defined in pure λ-calculus, and doing away with our useful symbol assignment, we quickly run into the problem of infinitely unrolling our expression deeper into itself. Reverse vore, if you will, horrifying!

However, per the Church-Turing thesis, we know that any non-halting problem can be computed via a Turing machine and must also be representable in the λ-calculus, so how would we represent a halting computation such as factorial?

Fixed Point Analysis

In standard algebra, a fixed point of an arbitrary function $f$ is anywhere where the function intersects $y = x$ . In other words, where $f(c) = c$ . For example, $\sin x$ has a singular fixed point at $x=0$ :

$f(x) = x^2$ has two fixed points at $x =0, 1$ and a function like $f(x) = x^2 + 5$ has no fixed points. In the λ-calculus, we can use this notation in conjunction with our rules of substitution which define function application to delay the application of an otherwise-recursive function call within another function.

For example, if we have some term $a$ , we can equivalently express it as:

(\lambda g. g\;a)f

such that the original function $f$ becomes the argument to the abstraction $\lambda g$ which then evaluates to $(f \; a)$ . By using $\lambda g$ in place of $f$ in a recursive function, we can substitute the recursive call with our entirely new function which does not recurse, but instead applies the recursive ~effect

at the appropriate time during function application. There is, of course, a general form for this process. Several, actually.

The $Y$ combinator

The first of note being the $Y$ combinator which is a function that returns the fixed points for any input function:

\begin{aligned} Y &= \lambda f.\big(\lambda x.f\;(x\;x)\big) \big(\lambda x. f\;(x\;x)\big) \\ \implies (Y \;g) &= \Big(\lambda f.\big(\lambda x.f\;(x\;x)\big) \big(\lambda x. f\;(x\;x)\big) \;g\Big) \\ &= \big(\lambda x.g\;(x\;x)\big) \big(\lambda x. g\;(x\;x)\big) \\ &= g\;\Big(\big(\lambda x. g\;(x\;x)\big)\;\big(\lambda x. g\;(x\;x)\big)\big) \\ &= g \; (Y \; g) \end{aligned}

and we can iterate the $Y$ combinator as well to achieve any amount of nested "recursion":

\begin{aligned} Y \; g &= g \; (Y \; g) \\ &= g \; \big(g \; (Y \; g) \big) \\ &= g \; \Big (g \; \big(g \; (Y \; g) \big) \Big)\\ &\vdots\\ &= g \; \Big (\cdots g (Y \; g) \cdots\Big) \end{aligned}

The Turing Fixed Point combinator

Another useful combinator which achieves a nearly identical effect is Turing's fixed point combinator denoted $\Theta$ . Consider the helper function:

U = \lambda xy.\Big(y \; \big ( (x\; x)\; y \big)\Big)

The Turing FPC is defined as:

\Theta = (U \; U)

Let's see it in action for some arbitrary function $g$ we want to find the fixed point of:

\begin{aligned} (\Theta \; g) &= \big((U \; U) \; g\big) \\ &= \Bigg(\Big( \lambda xy.\big( y \; ( (x\; x)\; y) \big) \; U\Big) \; g\Bigg) \\ &= \Big( \lambda y.\big(y \; (( U\; U) \; y)\big) \; g \Big) \\ &= g \; \big(( U\; U) \; g)\big) \\ &= g \; \big(\Theta \; g\big) \\ \end{aligned}

So, $(\Theta \; g)$ is a fixed point of $g$ . But wait, does this mean we've solved the Reimann hypothesis, since we could use this to find all fixed points of the Reimann Zeta function which is isomorphic to a solution to the open problem?

Not quite... Despite the fact that –unlike functions on the real numbers– not only does every function in the λ-calculus have multiple strategies for finding a guaranteed fixed point, every function has a trivial fixed point, these fixed points can be arbitrary lambda expression with no numerical interpretation from which we could recover a real valued quantity.

For example, while applying $\Theta$ to $f = x^2 + 2$ does yield an expression who's fixed point is itself a valid lambda expression, it's wholly meaningless spaghetti outside of the context of indirecting recursion.

Recursion Revisited (ha ha ha)

Returning to our stubbed definition of $fac$ , we now substitute the problematic recursion call with some placeholder function $f$ and consider the resultant fixed point of this function:

\begin{aligned} fac &= \lambda n.\Big(\mathtt{is\_zero} \; 1 \; \big(\times \; n \; (\color{red}fac\color{black} \;( - \; n\; 1)\big) \Big) \\ &= \lambda \color{red}f\color{black}n.\Big(\mathtt{is\_zero} \; 1 \; \big(\times \; n \; (\color{red}f\color{black} \;( - \; n\; 1)\big) \Big) \\ \\ \implies (\Theta \;fac) &= fac \; \big(\Theta \; fac\big) \\ &=\Bigg( \lambda fn.\Big(\mathtt{is\_zero} \; 1 \; \big(\times \; n \; (f \;( - \; n\; 1)\big) \Big) \Bigg) (\Theta \; fac) \\ &=\Bigg( \lambda n.\Big(\mathtt{is\_zero} \; 1 \; \big(\times \; n \; ((\Theta \; fac) \;( - \; n\; 1)\big) \Big) \Bigg) \end{aligned}

boom! There's no recursion here since $(\Theta \; fac)$ is finite in closed form, but via β-reduction we've derived the equivalent recursive characteristic function.

Other Combinators

Many of these combinators get their names from German Mathematician Moses Schönfinkel.⁸ Not all combinators are used for recursive effect – in general, a combinator is just a closed lambda expression meaning it has no free variables.

Identity / Identitätsfunktion

The simplest combinator is an abstraction which just returns its argument:

I = \lambda x.x

Constant / Konstanzfunktion

Synonymous with $\top$ and/or $True$ , this is also known as the discarding combinator:

K = \lambda xy.x

Substitution / Verschmelzungsfunktion

S = \lambda xyz.x\; z\; (y \; z)

This combinator perhaps gets its name from the role it plays when converting from lambda expressions to combinatory logic statements.⁹ ¹⁰ Consider the expression:

\lambda a. (P \; Q)

If we wanted to find the equivalent SKI-combinator for the application of this expression with some argument $T$ , the argument would substitute it into $P$ and $Q$ in place of the free variable $a$ , the apply the result $P[a:=T]$ to the result $Q[a:=T]$ . However, in combinatory logic, there are no parameter variables, so in place of abstraction, we need a combinator which performs an analogous substitution operation on its arguments. This combinator $S$ takes $P, Q$ and an argument $T$ and perform the combinatorial analogue of substitution before applying $P$ to $Q$ e.g.

S \; P\;Q\;T = (P \; T)\;(Q \; T)

Alternatively, if we have an expression with a repeated subexpression, we can first apply $T$ and $Z$ to move the duplicated subexpression into the appropriate position s.t. the result has the form:

(f \; x )(g \; x )

which we can then Schmelzen or fuse together via the $S$ combinator:

S\;f\;g\;x = (f \; x )(g \; x )

The combination of these $S, K, I$ combinator give rise to the SKI calculus¹¹ which alone is Turing Complete.

Iota

The universal combinator

i = \lambda x.x\; S \; K

Composition / Zusammensetzungsfunktion

B = \lambda xyz.x\; (y \; z)

Swapping

C = \lambda xyz.x\; z\; y

Duplication

W = \lambda xy.x\; y\; y

Self-application

\omega = \lambda x.x\; x

Divergence (fork bomb)

\Omega = \omega \; \omega

Strict Fixed Point

This is another "recursive" combinator. It differs from e.g. the $Y$ combinator insofar as it will it will work with all the reduction orders suitable for its lazy counterpart (the $Y$ combinator). In addition, it will also work with CBV (call-by-value) and HAP (hybrid applicative) reduction orders, though it is not a drop-in replacement; in order for such expressions to work, they need to be modified so that the evaluation of arguments of conditionals and other terms that need to be lazy is delayed.

Z = λf.\big(λx.f \; (λv.x \;x \;v)\big) \;\big(λx.f \; (λv.x \;x \;v)\big)

We say that it is strict w.r.t. the $Y$ combinator, since the $Y$ combinator defers evaluation until a reducible expression is at the "head" of statement. Therefore, it is suitable for NOR (normal), HNO (hybrid normal), CBN (call-by-name), and HSP (head spine) reduction orders, but unsuitable for eager/applicative evaluation orders.

Thrush / Vertauschungsfunktion

Reverse application

T = \lambda xf.f\;x

β-reducability

It's worth noting here that an expression which can't be β-reduced any further is said to be in β-normal form. Additionally, throughout the process of β-reduction, there are oftentimes multiple choices of which term to reduce next (as hinted by the preceding discussion of evaluation orders).

For example, here's the reduction graph for $(+ 1 \; 1)$ which requires 8 β-reductions:

However, at node 1, the choice of reduction of either $m$ or $n$ is arbitrary, so here our graph forks:

And it expands once more before converging again on the resultant term:

This begs the question about which kinds of expressions are reducible, or have a β-normal form. There do exist a number of irreducible expressions such as the omega fork bomb:

\begin{aligned} \omega &= \lambda x.x\;x \\ \implies (\omega \; \omega) &= \big((\lambda x.x\;x) \; (\lambda x.x\;x)\big) \\ &= \big((\lambda x.x\;x) \; (\lambda x.x\;x)\big) \\ &\vdots \end{aligned}

Or it's infinitely-expanding cousin $\omega_3$ :

\begin{aligned} \omega_3 &= \lambda x.x\;x\;x \\ \implies (\omega_3 \; \omega_3) &= \big((\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x)\big) \\ &= \big((\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x)\big) \\ &= \big((\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x) \; (\lambda x.x\;x\;x)\big) \\ &\vdots \end{aligned}

This begs the question, does the order of operation of β-reduction matter? For a function like $fac \; 3$ , the graph is convergent for the first few hundred β-reductions, but then it forks into two branches, one of which is infinitely divergent since it attempts to "unroll" the recursive call via the combinator rather than leverage the base case. However, the Church-Rosser theorem states that the order does not matter, and that at any proverbial fork in the road, if there are two distinct reductions or sequences of reductions that can be applied to the same term, then there exists a term that is reachable from both results.¹² ¹³

Reduction Strategies

Whether or not our reduction traverses one of these intermediate states is determined by the β-reduction strategy we employ. There are a number of well-researched reduction strategies which lend themselves to different types of expressions.

Thus far, we've mainly been using normal order reduction, which reduces the leftmost, outermost β-redex first before proceeding to the sub-expressions contained within, or following the left-most expression. This effectively results in deferring the evaluation of the arguments to a function until the last possible moment.

The obvious counterpart to normal order reduction is called applicative order which prioritizes the leftmost innermost expression. Applicative order is sometimes referred to a eager evaluation, which may be unfit for unrolling some expressions as we saw with our $fac$ example.

Consider the following expression under the two reduction orders:

\lambda x.\big((x \; y ) \; (y \; x )\big) \; \big(\lambda w.(w \; w )\; z))

Under normal order ( $N$ ) reduction the abstraction binding $w$ will replace both occurrences of $x$ in the $(x \;y) \; (y \;x)$ applications in the first abstraction before evaluating the $\lambda w$ expressions:

\begin{aligned} \lambda x.\big((x \; y ) \; (y \; x )\big) \; \big(\lambda w.(w \; w )\; z)) &\rightarrow_N \Big(\big(\big(\lambda w.(w \; w )\; z)) \; y ) \; (y \; \big(\lambda w.(w \; w )\; z)) \big)\Big) \\ &\rightarrow_N \Big(\big((z\; z) \; y ) \; (y \; \big(\lambda w.(w \; w )\; z)) \big)\Big) \\ &\rightarrow_N \Big(\big((z\; z) \; y ) \; \big(y \; (z\; z) \big)\Big) \end{aligned}

Note that normal order reduction requires evaluation of the $\lambda w$ abstraction twice before getting the final answer. Applicative order ( $A$ ) reduction on the other hand will first reduce the $\lambda w$ expression to $(z\; z)$ before substituting it into $x$ :

\begin{aligned} \lambda x.\big((x \; y ) \; (y \; x )\big) \; \big(\lambda w.(w \; w )\; z)) &\rightarrow_A \lambda x.\big((x \; y ) \; (y \; x )\big) \; (z\; z) \\ &\rightarrow_A \big((z\; z) \; y \big) \; \big(y \; (z\; z) \big) \\ \end{aligned}

While applicative order seems like the clear winner in this example, it's trivial to construct a counter example (any recursive function) for which applicative order reduction will not only lose to normal order reduction, but may fail reduce to a β-normal form entirely, e.g. this expression nicely reduces via normal order:

\begin{aligned} \lambda x.m \; \big(\lambda x. (x\;x) \; \lambda x. (x\;x)\big) &\rightarrow_N \lambda x.m \; \Big(\big(\lambda x. (x\;x)\big) \; \big(\lambda x. (x\;x)\big)\Big) \\ &\rightarrow_N m \end{aligned}

whereas applicative order will attempt to unroll the fork bomb before exploiting the fact that the outermost $\lambda x$ discards its argument entirely.

Other Reduction Strategies

Along with normal and applicative order, other common strategies include:

Head Spine (HSP): leftmost outermost, abstractions only reduced when in the head position,
Hybrid Normal (HNO): a mix between Head Spine and Normal,
Call by Value (CBV): leftmost innermost, no reductions inside abstractions,
Hybrid Applicative (HAP): a mix between Call by Value and Applicative order
Call by Name (CBN): leftmost outermost, no reductions inside abstractions

These seven reduction strategies produce reductions which fall into one of four normal forms: NF (top left), Weak NF (top right), Head NF (bottom left), WHNF (bottom right)¹⁴

reduce args	reduce under abstraction
reduce args	yes	no
yes	APP, NOR, HAP, HNO	CBV
no	HSP	CBN

We can further categorize reduction strategies as pure or uniform (those which only involve that reduction strategy itself) vs. hybrid (those which use a uniform strategy for the reduction of an abstraction $x$ in an application $(x \; y)$ ).

hybrid	uniform
`NOR`	`CBN`
`HAP`	`CBV`
`HNO`	`HSP`

John Tromp

Now, this would be a sane place to end the post, but right around this point in researching for this post and was looking into "elegant" programs implemented in the λ-calculus I found John Tromp. Where to begin with this guy...

Gregory Chaitin paraphrases John McCarthy about his invention of LISP, as "This is a better universal Turing machine. Let’s do recursive function theory that way!" Chaitin continues "And the funny thing is that nobody except me has really I think taken that seriously."

He was the first to strongly solve Connect 4 (with four-and-a-half years of computation time)¹⁵ which warrants its own post in the computational game theory series, but he's also done a shitton of similarly SIGBOVIK-esque work in the λ-calculus.

¹⁶

BLC

The λ-calculus is already the most minimal formalization of ~all of computation, but Tromp took it a step further by encoding all of it ("all" is a stretch, since we know it consists of only like 3 things) in binary which constitutes one of the cooler esolangs¹⁷ I've ever encountered.

Starting from first principles, we define:

\begin{aligned} 0 = \lambda xy.x &= \top \\ 1 = \lambda xy.y &= \bot \end{aligned}

Inputs and outputs to a BLC program are lists built from $cons$ which is just an object holding two other values:

cons = \lambda xyz.(z \; x \;y)

We can use $cons$ to cons-truct a $nil$ -terminated list like so:

\begin{aligned} nil &= \lambda x.x\\ l &= (cons \;1 \; (cons \; 2 \; (cons \; 3 \; (cons \; nil)))) \end{aligned}

with this definition of $nil$ , we can leverage other methods to comprehend our lists like so:

\begin{aligned} \mathtt{is\_empty} &= \lambda lxy.l(\top ) \; \bot \; \bot \\ \\ \implies (\mathtt{is\_empty} \; nil) &= \big(\lambda lxy.l \; (\top ) \; \bot \; \bot\big) \; nil \\ &= \lambda xy. nil \; (\top) \; \bot \; \bot \\ &= \lambda xy.(\lambda x.x) \; (\top) \; \bot \; \bot \\ &= (\lambda xy. \top) \; \bot \; \bot \\ &= (\lambda y. \top) \; \bot \\ &= \top \\ \\ \implies \big(\mathtt{is\_empty} \; (pair \; a \; b) \big) &= \big(\lambda lxy.l \; (\top ) \; \bot \; \bot\big) \; (pair \; a \; b) \\ &= \big(\lambda xy.(pair \; a \; b) \; (\top ) \; \bot \; \bot\big) \\ &= (pair \; a \; b)\big(\lambda xy. \; \top \big) \; \bot \; \bot \\ &= \big((\lambda xyz. z\;x\;y) \;a \;b\big) \big(\lambda xy. \; \top \big) \; \bot \; \bot \\ &= \big((\lambda yz. z\; a\; y) \;b\big) \big(\lambda xy. \; \top \big) \; \bot \; \bot \\ &= (\lambda z. z\; a\; b) \big(\lambda xy. \; \top \big) \; \bot \; \bot \\ &= \big(\lambda xy. \; \top\big) \; a\; b \; \bot \; \bot \\ &= \big(\lambda y. \; \top\big) \; b \; \bot \; \bot \\ &= \top \; \bot \; \bot \\ &= (\lambda xy.x) \; \bot \; \bot \\ &= \bot \\ \end{aligned}

suffice it to say, we can build lists and use them. For a finite, binary string $s = b_0b_1...b_n$ and a lambda term $T$ , we denote the list $s:T$ to be the $T$ -terminated list of its bits (i.e. $nil = T$ ):

cons \; b_0 \;(cons \; b_1 \; (cons \; (\cdots \; cons \; b_{n-1} \; T) \cdots ))

A λ-machine is a lambda term $M$ applied to an input binary stream, and the normalized result of this application is the output. E.g. for the identity function $\lambda x.x$ , and input $s=10$ , a λ-machine yields:

(\lambda x.x)\;(cons\; 1 \; (cons \; 0 \; nil)) = (cons\; 1 \; (cons \; 0 \; nil))

indicating that:

$id(10:T)=(10:T)$

So, we can translate bit strings to lambda expressions, big whoop. What about the inverse: translating a lambda expression $M$ back into a bit string. We'll denote this process as $code(M)$ such that a Universal Turing Machine could operate according to

U(code(M):T) = M(T)

The question is then: how to concisely encode the necessary, though few, semantics of the λ-calculus: abstraction, application, and variables.

De Bruijn Indices

Named for the Dutch mathematician,¹⁸ De Bruijn indices define a name-free notation for variables where variable symbols are instead substituted with the number of nested lambdas up to its binding λ. E.g.

\begin{aligned} pair &= \lambda xyz.z\;x\;y\\ &= \lambda x.\lambda y. \lambda z.z\;x\;y\\ &\rightarrow_{DB} \lambda\;\lambda\;\lambda\;.1\;3\;\;2 \end{aligned}

To encode a lambda term $M$ in BLC, we express it via DB indices, and insert explicit application operators $@$ which are necessary to disambiguate applications from the stream of 1s and 0s which ought to be interpreted as functions/data. So the above expression would be "prepared" as:

\lambda\;\lambda\;\lambda\; @ \; @ \;1\;3\;\;2

Finally, we encode each element of this modified λ-calculus in binary as:

\begin{aligned} \lambda &= \color{red}00\color{black} \\ @ &= \color{green}01\color{black} \\ n &= \color{blue}1^n 0\color{black} \\ \end{aligned}

so, the result of $code(pair)$ would be:

\begin{aligned} code(pair) &= code(\lambda xyz.z\;x\;y) \\ &= code(\lambda\;\lambda\;\lambda\; @ \; @ \;1\;3\;\;2) \\ &= \color{red}00\;00\;00\color{black} \;\color{green}01\;01\color{black}\; \;\color{blue}10\;1110\;110\color{black} \end{aligned}

The self-interpreter for BLC¹⁹ in pre-BLC is given by

\begin{aligned} I = &\color{red}(\color{black} λ \;1\; 1 \color{red})\color{black} \\ &\color{orange}(\color{black} λ\; λ\; λ\; 1 \\ &\quad\color{yellow}(\color{black} λ\; λ\; λ\; λ\; 3 \\ &\qquad \color{green}(\color{black} λ\; 5 \\ &\qquad\quad \color{yellow}(\color{black} 3 \\ &\qquad\qquad\color{red}(\color{black} λ\; 2 \\ &\qquad\qquad\quad \color{orange}(\color{black} 3 \\ &\qquad\qquad\qquad \color{yellow}(\color{black} λ\; λ\; 3 \\ &\qquad\qquad\qquad\quad\color{green}(\color{black} λ 1 2 3 \color{green})\color{black} \\ &\qquad\qquad\qquad \color{yellow})\color{black} \\ &\qquad\qquad\quad \color{orange})\color{black}\; \\ &\qquad\qquad\quad \color{cyan}(\color{black} 4 \\ &\qquad\qquad\qquad \color{blue}(\color{black} λ\; 4 \\ &\qquad\qquad\qquad\quad \color{purple}(\color{black} λ\; 3\; 1\; (2\; 1) \color{purple})\color{black} \\ &\qquad\qquad\qquad \color{blue})\color{black} \\ &\qquad\qquad\quad \color{cyan})\color{black} \\ &\qquad\qquad \color{red})\color{black} \\ &\qquad\quad \color{yellow})\color{black} \\ &\qquad\quad \color{red}(\color{black} 1 \\ &\qquad\qquad \color{orange}(\color{black} 2 \\ &\qquad\qquad\quad \color{yellow}(\color{black} λ\; 1\; 2 \color{yellow})\color{black} \\ &\qquad\qquad \color{orange})\color{black} \\ &\qquad\qquad \color{green}(\color{black} λ\; 4 \\ &\qquad\qquad\quad \color{cyan}(\color{black} λ\; 4 \\ &\qquad\qquad\qquad \color{blue}(\color{black} λ\; 2 \\ &\qquad\qquad\qquad\quad \color{purple}(\color{black} 1\; 4 \color{purple})\color{black} \\ &\qquad\qquad\qquad \color{blue})\color{black} \\ &\qquad\qquad\quad \color{cyan})\color{black} 5 \\ &\qquad\qquad \color{green})\color{black} \\ &\qquad\quad\color{red})\color{black} \\ &\qquad \color{green})\color{black} \\ &\quad \color{yellow})\color{black} \\ &(3\; 3)\; 2 \\ &\color{orange})\color{black} \end{aligned}

which satisfies –for all closed terms $M$ –

E \; C \; (code(M):N) = C \; (\lambda z.M) \; N

and is a mere 206 bits long.

A Universal Turing Machine in BLC can be expressed as:

\begin{aligned} code(UTM) &= \\ &\color{green}0101\color{red}00\color{green}01\color{blue}1010\color{red}0000000\color{green}10101\color{blue}10\color{red}000\\ &\color{red}00000\color{green}01\color{blue}1110\color{red}00\color{green}0101\color{blue}111110\color{green}01\color{blue}1110\\ &\color{red}00\color{green}0101\color{blue}110\color{green}01\color{blue}1110\color{red}0000\color{green}01\color{blue}1110\color{red}00\color{green}01\\ &\color{green}01\color{blue}101101110\color{green}01\color{blue}11110\color{red}00\color{green}01\color{blue}11110\color{red}00\\ &\color{green}0101\color{blue}111010\color{green}01\color{blue}11010\color{green}0101\color{blue}10\color{green}01\color{blue}110\color{red}0\\ &\color{red}0\color{green}01\color{blue}10110\color{red}00\color{green}0101\color{blue}11110\color{red}00\color{green}01\color{blue}11110\color{red}0\\ &\color{red}0\color{green}01\color{blue}110\color{green}01\color{blue}1011110111110\color{green}01\color{blue}111011\\ &\color{blue}10110\color{red}00\color{green}01\color{blue}10\color{green}01\color{red}00\color{green}01\color{blue}1010\color{red}00\color{green}01\color{blue}1010\\ \end{aligned}

Visualizing λ

But he didn't stop there, oh no no no dear reader. This boy has got the tism and I'm buying what he's selling. Tromp devised a means of visualizing lambda²⁰ expressions where:

Abstractions are horizontal bars
Variables are vertical bars spawning from the lambda they're bound to
Applications are expressed by the branching structure connecting abstractions and variables

We'll begin by stepping through the fundaments and then applying Tromp's rules for lambda diagrams to some of the more complex expressions we've derived so far.

Furthermore, we can trace the application of more structured –though yet unreduced– lambda expressions. For example, the arbitrary function reduction:

\begin{aligned} \Big(\lambda \color{red}x\color{black}.\big(\lambda y.(\color{red}x\color{black} \; z)\big)\Big)\color{blue}\big(\lambda a.(\lambda b.b)\big)\color{black} &\rightarrow_\beta \Bigg( \lambda y.\Big( \color{blue}\big(\lambda a.(\lambda b.b)\big)\color{black} \; z\Big) \Bigg)\\ &= (\lambda yab.b) \; z \end{aligned}

(which can't be β-reduced further without additional terms, since this expression effectively selects the third element of a triple, and without knowing anything else about the free term $z$ , we can't simplify any further). Nevertheless, this reduction can be illustrated as:

A less-abstract²¹ example might be the β-reduction of logical negation which, without any arguments, resembles:

An pseudo-arithmetic operation like $succ \lambda n f x. f \; (n \; f\; x)$ has the argument-less structure:

and for some (sanely small) $n$ , say $n =2$ , we can pipe in an argument:

which is three! (not factorial, I'm just excited). Speaking of factorial, though, here's that:

Elegant λ programs

²²

The problem I'm most-interested in which seems least-insane to attempt to implement in pure lambda calculus and then attempt to diagram is the Sieve of Eratosthenes. Here's about where I would typically outsource an explanation to/from Wikipedia, or some other contextually relevant source, such as, idk, the Haskell website which currently has:

primes = filterPrime [2..] where
  filterPrime (p:xs) =
    p : filterPrime [x | x <- xs, x `mod` p /= 0]

on the masthead as a sample of Haskell's syntax, but once upon a time (2015) was instead expressed:

primes = sieve [2..]
  where sieve (p:xs) =
          p : sieve [x | x <- xs, x `mod` p /= 0]

which incorrectly!!!! implies that the technique being implemented was a sieve, which sparked much derision²³ amongst the most insufferable demographic of programmers: FP Haskellers.

So, we will briefly²⁴ take a detour to completely understand the algorithm, it's purely functional implementations (runtime complexity be damned), and pure-optimizations that we can make before taking a stab at implementing it ourself, and building some tooling along the way to help keep me sane.

The Real Sieve of Eratosthenes

This section is largely informed by Melissa O'Neill's paper entitled The Genuine Sieve of Eratosthenes²⁵ which lampoons false sievery and discusses a number of not-quite-strictly λ-pure optimization (by which I mean: use of data structures that I cbf to implement and retain a legible diagram for).

The sieve function –as described by the 2nd century B.C. Greek mathematician– is as follows:

Begin with a collection of numbers-to-be-sieved, sorted in ascending order e.g. $2, 3, 4, ...$ ,
Starting with the first number $p$ , declare it to be prime, and eliminate all multiples of that number in our list, starting from $p^2$ (e.g. $4, 6, 8, 10, ...$ would be removed)
Set $p$ to be the next non-eliminated number after $p$ and repeat.

For a fixed-size list, this algorithm terminates at the $\sqrt n$ th entry. This process crucially differs from the naïve trial division algorithm claiming prime real estate over on Haskell dot org since it does not employ any notion of division. While trial division is asymptotically superior to Eratosthenes' methodology,

we can cut him some slack since the notion of like ~a million probably would've knocked his sandals off. Additionally, we can make some slight tweaks to our Grecco-faithful implementation which, without changing the functional affect of any of the steps, can save us a number of operations proportional to the input size (which will translate to –I shit you not– billions of β-reductions when computing more than a few dozen primes).

Whereas the original algorithm crosses off all multiples of a prime at once, we perform these "crossings off" in a lazier way: crossing off just-in-time. For this purpose, we will store a table in which, for each prime $p$ that we have discovered so far, there is an "iterator" holding the next multiple of $p$ to cross off. Thus, instead of crossing off all the multiples of, say, $17$ , at once (impossible, since there are infinitely many for our limit-free algorithm), we will store the first one (at $17 \times 17$ ; i.e., $289$ ) in our table of upcoming composite numbers. When we come to consider whether $289$ is prime, we will check our composites table and discover that it is a known composite with $17$ as a factor, remove $289$ from the table, and insert $306$ (i.e., $289+ 17$ ). In essence, we are storing [JIT] "iterators" in a table keyed by the current value of each iterator.

O'Neil leverages Haskell's Data.Map to implement this approach:

sieve xs = sieve' xs Map.empty
	where
		sieve' [] table = []
		sieve' (x:xs) table =
			case Map.lookup x table of
				Nothing -> x : sieve' xs (Map.insert (x*x) [x] table)
				Just facts -> sieve' xs (foldl reinsert (Map.delete x table) facts)
			where
				reinsert table prime = Map.insertWith (++) (x+prime) [prime] table

which is $O(n \log n \log \log n)$ where $n$ is length of a fixed input, which is actually more better gooder than the time complexity of the trial division algorithm: $O\big(n\sqrt n/(\log n)^2\big)$ . Also I love that Professor O'Neil accounts for non-unit cost of the arithmetic division operations utilized by unfaithful sieves which would offset the $\log n$ cost of using a tree data structure. She continues to improve the performance of this approach by swapping out the native Data.Map for a bespoke PriorityQueue which is clever since the proposed algorithm only ever needs to check the least element of the collection i.e. the head.

She supposes that, given the existence of a PriorityQueue with the following API

empty              :: PriorityQueue k v
minKey             :: PriorityQueue k v -> k
minKeyValue        :: PriorityQueue k v -> (k,v)
insert             :: Ord k => k -> v -> PriorityQueue k v -> PriorityQueue k v
deleteMinAndInsert :: Ord k => k -> v -> PriorityQueue k v -> PriorityQueue k v

we can milk Eratosthenes a bit more with minor adjustments:

sieve [] = []
sieve (x:xs) = x : sieve' xs (insertprime x xs PriorityQueue.empty)
	where
		insertprime p xs table = PriorityQueue.insert (p*p) (map (* p) xs) table
		sieve' [] table = []
		sieve' (x:xs) table
			| nextComposite <= x = sieve' xs (adjust table)
			| otherwise = x : sieve' xs (insertprime x xs table)
				where
					nextComposite = PriorityQueue.minKey table
					adjust table
						| n <= x = adjust (PriorityQueue.deleteMinAndInsert n' ns table)
						| otherwise = table
					where
						(n, n':ns) = PriorityQueue.minKeyValue table

This approach also cleanly lends itself lazily incrementing by factors of $2$ , rather than $1$ (which would require redundant evaluation of even numbers) whereas imperative approaches would require additional asymptotically non-trivial amounts of $2 \times x$ and $x +2$ operations.

O'Neil notes that this simple elimination of even numbers improves performance by a factor of $3$ ! (still not factorial, sorry), and that we can eek out some further performance boosts by skipping the about 3/4s of the remaining composites which are divisible by $2,3,5,$ and/or $7$ .

As we saw above, to produce numbers that are not multiples of $2$ , we simply begin at $3$ and then keep adding $2$ . To avoid multiples of both $2$ and $3$ , we can begin at $5$ and alternately add $2$ then $4$ . We can visualize this technique as a wheel of circumference 6 with holes at a distance of $2$ and $4$ rolling up the number line. In general, adding an additional prime p to the wheel multiplies the circumference of the wheel by $p$ , and removes every $p$ ^th composite. Thus, there are usually diminishing returns for large wheel sizes—our wheel for the first four primes has circumference $210$ (i.e., $2 \times 3 \times 5 \times 7$ ) with 48 holes, whereas the wheel for eight primes has circumference 9,699,690 and 1,658,880 holes, but eliminates fewer than 7% of the remaining composites.

A small wheel can be implemented as:

wheel2357 = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:4:8 :6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel2357
spin (x:xs) n = n : spin xs (n + x)
primes = 2 : 3 : 5 : 7 : sieve (spin wheel2357 11)

yielding the following performance gains:

And, having gracefully dunked on The Sleight on Eratosthenes, O'Neil concludes with a list-based faithful implementation of the Sieve shared by a friend and fellow prime-enthusiast Paul Pritchard (and nicely animated here²⁶ by the chad himself) which I can bf to translate from Haskell to the λ-calculus:

primes = 2:([3..] ‘minus‘ composites)
	where composites = union [multiples p | p <- primes]

multiples n = map (n*) [n..]
(x:xs) ‘minus‘ (y:ys)
	| x < y = x:(xs ‘minus‘ (y:ys))
	| x == y = xs ‘minus‘ ys
	| x > y = (x:xs) ‘minus‘ ys

union = foldr merge []
	where
		merge (x:xs) ys = x:merge' xs ys
		merge’ (x:xs) (y:ys)
			| x < y = x:merge' xs (y:ys)
			| x == y = x:merge' xs ys
			| x > y = y:merge' (x:xs) ys

which makes careful use of laziness since taking the union of the infinite list of infinite lists:

\Big[[4,6,8,10,...], [9,12,15,18,...], [25,30,35,40,...], ...\Big]

is tricky unless we exploit the fact that the first element of the result is the first element of the first element of the first infinite list, hence the mindful definition of union. This implementation has a time complexity of $O\big(n\sqrt n \log \log n/(\log n)^2\big)$ which is worse than trial division by a small factor of $\log \log n$ , but past this point –as we veer into incoming λ-traffic– we don't particularly care anymore. Pritchard has numerous publications regarding sub-linear faithful sieves.²⁷^,²⁸

Anyways, O'Neil leaves a wheel + list-based implementation of Pritchard's Sieve as an exercise to the reader.²⁹

you psycho.

Spare Aude: a devlog

With an end goal of producing the Tromp diagram for Pritchard's list-based implementation of the faithful Sieve (and then, depending on how many white hairs I have after the fact, adding a wheel to the mix), let us first begin with the diagramming codebase itself.

Paul Brauner graciously wrote this Haskell script to parse (custom) .lam files and translate them into an animation of the β-reduction to normal form (if any exists. If it doesn't exist, good luck recapturing that disk space bc this bad boi is gonna try).

Setting up this repo alone was a trial-and-half since its last commit was May 2018. Notably, the first commercially available Apple Silicon chips³⁰ weren't released until November 2020. Unsurprisingly, the version of Stack, Cabal, and GHC Paul used are incompatible with the arm64 instruction set, and are no longer receiving LTS. Upgrading to the oldest arm-compatible release of each of the requisite build tools breaks a gazillion dependencies that I –already woefully unfamiliar with Haskell's ecosystem– am unwilling to unwind; it ended up being easier to blow the dust off ye' old gaming box than sort out virtualization compatibility or containerization (alpine and Virtual Box merely added layers of indirection before running into the underlying issue that my arm64 CPU simply cannot produce x86_64 instruction – whoda thunk it).

After getting the visualization repo setup, the next task was to translate Pritchard's Haskell-ish pseudo code into the .lam format which Paul's program was expecting. His program is great if your lambda expressions are valid, but less helpful if you are prone to introducing errata in the sea of parens and recursion necessary to implement the sieve as described. To draft an implementation, I turned first to a number of browser based λ-calculus interpreters/parsers rather than fight more uphill battles of compatibility/tooling/poorly documented³¹ toy repos, but I quickly ran into a problem.

At this point, I understood that even "trivial" computations like $fac \; 5$ could unfurl into thousands of costly β-reductions, but I (clueless) had (naïve) faith in my browser's ability to help validate my implementations of even the basic requisite functions. https://lambster.dev/ in particular was "a pleasure" to use, and https://www.allisons.org/ll/FP/Lambda/ was surprisingly robust (the absence of any styling/navigation assistance on this site lmk that I was working with a real FPer), however neither was robust enough.

Alas, I resumed skimming repos, looking for "C/C++" or anything else that screamed "speed" with $fac \; 5$ as my benchmark.³² After more determining that most of the "optimized" or "memory efficient" parsers were painfully old/incompatible, or just outright broken (back in my day, you got an F if your program segfaulted) even if I did manage to get them to build, I stumbled into the light.

God bless Rust

https://github.com/orsinium-labs/rlci fire. Go look at it and tell me it's not fire.

Splendid for handling $fac \; 5$ , but 1) I dislike it's syntax choices (significant whitespace – because, sans any other syntactic sugar like currying, we can actually do away with the periods all together) and 2) I cbf to inject my own analytics/debugging crap into its AST parser. However, having tasted the splendor of a well-maintained codebase,³³ I refined my search for a more hands on λ-calculus Rust tool.

https://github.com/ljedrz/lambda_calculus is (almost) everything I wanted.

Naive Eratosthenes

For our sieve implementation, we'll need:

primes = Y
	(λself.
		λf.cons two (
			minus (from three) (union (map multiples self))
		)
	)

And, quite frankly, the further I got, the more I realized I might have bit off more than I could chew. While ljderz provided a host of user-friendly APIs in his lib, it still required a number of modifications/additions before we could just drop in this algorithm. My (admittedly messy) fork of the repo with these additions can be found here. I have no ambitions of getting it merged, I have accepted my station in life as a python-peon.

So I found myself with what I thought was a faithful sieve implementation, but my program continued to overflow the stack when attempting to reproduce even a trivial amount of primes e.g. $2,3,5,7,11$ – which is about when the white hairs started to pop up.

Rubber Duckies (aka hostage coworkers)>> Stack Overflow

[peter.murphy] So my sieve works for a finite amount of primes
I found the issue
It’s the reduction strategy
(because there’s several to choose from)
And because of the nested recursive calls (to union and merge and primes itself), I would have to define a custom reduction strategy
e.g. doing it by hand it works, however, to get even just 2,3,5,7 it takes 1.1mil reductions

[will.hombre] There isn't a "Don't be stupid" strategy built-in?

[peter.murphy] Not really… like normal order reduction is the laziest, which is usually safe
except when you reduce to a point where a recursive call is in the “head” position, but it would be more beneficial to unroll the argument to the head instead of the head itself
which I have at least one instance of: multiples and primes both spawn infinite lists on the same “line”

[O'Neil, citing Bird]
This code makes careful use of laziness. In particular, Bird remarks that “Taking the union of the infinite list of infinite lists [[4,6,8,10,..], [9,12,15,18..], [25,30,35,40,...],...] is tricky unless we exploit the fact that the first element of the result is the first element of the first infinite list. That is why union is defined in the way it is in order to be a productive function.”²⁵

[peter.murphy] and while I get why this highlighted part is crucial and should lend itself to normal order reduction, I think you really want to unroll “one step” of multiples and primes in lockstep at a time
BFS the redux tree so to speak
but then at the same time, you need to prioritize a take n which will be in the head position, and so the trick is to get like 5 “steps” into multiples and primes each (exactly like we had on the whiteboard) to give you enough of a stream to take from
and there’s no general reduction order for this hyper specific case lol

So, finally we arrive at an unfortunately finite example:

use lambda_calculus::{
	data::{
		list::pair::{cons, from, map, minus, take, union},
		num::church::mul,
	},
	*,
};


fn main() {
	let multiples = abs({
		app!(
			map(),
			abs!(1, app!(mul(), Var(2), Var(1))), // n * x
			from(Var(1)) // Generate the list starting from n
		)
	});

	// [2, 3, 5]
	let finite_primes = vec![
		2.into_church(),
		3.into_church(),
		5.into_church()
	].into_pair_list();

	// builds an (infinite) list of infinite lists: [[4, 6, 8, ...], [9, 12, 15, ...], [25, 30, 35, ...)
	let finite_composite_stream = app!(
		union(),
		app!(
			map(),
			app(take(), 4.into_church()),
			// Apply map to multiples for the current prime
			app!(map(), multiples.clone(), finite_primes.clone())
		)
	);

	//  Generate the next prime candidates from the natural numbers
	let natural_numbers = app(app(take(), 9.into_church()), from(3.into_church()));

	// Subtract composites from natural numbers to get the primes
	let rest = app!(
		minus(),
		natural_numbers.clone(),
		finite_composite_stream.clone()
	);

	// Return the result by appending the base case (2) with the remaining primes
	let mut result = app!(cons(), 2.into_church(), rest.clone());

	println!("\n\n{}: {}", beta(result, NOR, 0));
}

which produces the satisfactory output (after nearly a million reductions):

λa.a
	(λb.λc.b (b c)) // 2
	(λb.b (λc.λd.c (c (c d))) // 2
	(λc.c (λd.λe.d (d (d (d (d e))))) // 5
	(λd.d (λe.λf.e (e (e (e (e (e (e f))))))) // 7
	(λe.e (λf.λg.f (f (f (f (f (f (f (f (f (f (f g))))))))))) // 11 (λf.λg.g)))))

We can compare the efficiency of the various reduction strategies:

and observe that there's little noticeable difference expression size or number of reductions required to obtain a β-normal form for our finite expression. Pure applicative form stack overflows (unsurprisingly) somewhere between 3,000 and 4,000 β-reductions, with an expression size of 43,857 chars, and Hybrid Applicative Form goes foom after 11,000 reductions:

Having verified with a finite example, I am satisfied to plug in the infinite example into the diagramming application.

So now I just had to translate my working implementation of Pritchard's Sieve back into his bespoke .lam format to get a diagram.

Here it is:

let
   -- booleans
   true  = \a b -> a;
   false = \a b -> b;
   and 	 = \p q ->p q p;

   -- arithmetic
   mul 	   = \m n f -> m (n f);
   is_zero = \n -> n (\x -> false);
   pred    = \n f x -> n (\g h -> h (g f)) (\u -> x) (\u -> u);
   succ    = \n f x -> f (n f x);
   sub 	   = \m n -> n pred m;
   leq 	   = \m n -> is_zero (sub m n);
   eq 	   = \m n -> and (leq m n) (leq n m);

   -- combinators
   Z = \f -> (\x -> f (\v -> x x v)) (\x -> f (\v -> x x v));
   I = \x -> x;

   -- list functions
   nil 	  = false;
   is_nil = \l -> l (\h t d -> false) true;
   cons   = \x y z -> z x y;
   head   = \p -> p true;
   tail   = \p -> p false;
   take   = Z (\z n l -> is_nil l (\x -> nil) (\x -> is_zero n nil (cons (head l) (z (pred n) (tail l)))) I);
   from   = Z (\z n -> cons n (z (succ n)));

   map 	 = Z (\z f l -> is_nil l (\x -> nil) (\x -> cons (f (head l)) (z f (tail l))) I);
   foldr = \f a l -> Z (\z t -> is_nil t (\x -> a) (\x -> f (head t) (z (tail t))) I) l;
   merge = Z (\z xs ys -> is_nil xs ys (is_nil ys xs (leq (head xs) (head ys) (cons (head xs) (z (tail xs) ys)) (cons (head ys) (z xs (tail ys))))));

   union = foldr merge nil;
   minus = Z (\z xs ys -> (is_nil xs) nil (is_nil ys) xs leq (head xs) (head ys) (eq (head xs) (head ys)) (z (tail xs) (tail ys)) (cons (head xs) (z (tail xs) ys)) (z xs (tail ys)));

   multiples = \n -> map (\x -> mul n x) (from n);

   -- finite
   finite_primes = cons 2 (cons 3 (cons 5 nil));
   composite_stream = union (map (take 4)) (map multiples finite_primes);

   naturals = (take 9) (from 3);
   rest = minus naturals composite_stream;
   result = cons 2 rest;

   -- infinite variation
   primes = Z (\z -> cons 2 (minus (from 3) (union map mulitples z)));

   main = result -- N.B. no semicolon on last line
in
   main

And lo'

which is... fine I guess.

Footnotes

Haskell is divinely inspired ↩
Hilbert, David. "Mathematische Probleme." Vortrag, gehalten auf dem internationalen Mathematike-Congress zu Paris 1900, Gött. Nachr. 1900, 253-297, ↩
redundant to say this in the lambda calculus since all functions are of one variable ↩
Martín Abadi et al. "Types for Scott Numerals." lucacardelli, 1993. ↩
Aaron Stump et al. "Lambda encodings in type theory." University of Iowa, 2014. ↩
Stump, Aaron. "Efficiency of Lambda-Encodings in Total Type Theory." University of Iowa, 2016. ↩
Andrej Bauer, CS StackExchange, 2012. ↩
Schönfinkel, Moses. "On the building blocks of mathematical logic." WolframAlpha, 1924. ↩
MJD, CS StackExchange, 2021. ↩
"Where Did Combinators Come From? Hunting the Story of Moses Schönfinkel." WolframAlpha, 2020. ↩
Keenan, David. "To Dissect a Mockingbird: A Graphical Notation for the Lambda Calculus with Animated Reduction." dkeenan, 1996. ↩
Church, Alonzo and J.B. Rosser. "Some Properties of Conversion*." Princeton, 1936. ↩
"COMS W3261 – Lecture 24: The Lambda Calculus II." Columbia. ↩
Sestoff, Peter. "Demonstrating Lambda Calculus Reduction." University of Copenhagen. ↩
Tromp, John. "John's Connect Four Playground" tromp.github.io. ↩
mr. joshua / @pants. twitter, 2025. ↩
"Binary lambda calculus." esolongs.org. ↩
de Bruijn, N.G. "Lambda Calculus Notation with Nameless Dummies." Indagationes Math. ↩
Tromp, John. "Functional Bits: Lambda Calculus based Algorithmic Information Theory." tromp.github.io, 2023. ↩
Tromp, John. "Lambda Diagrams." tromp.github.io, 2023. ↩
joker.gif, ibid. ↩
https://www.youtube.com/@superruper1209 ↩
Kim-Ee Yeoh. "[Haskell-cafe] Prime sieve and Haskell demo." mail.haskell.org, 2015. ↩
lol ↩
O'Neil, Melissa. "The Genuine Sieve of Eratosthenes." Harvey Mudd College. ↩ ↩²
https://www.youtube.com/watch?v=GxgGMwLfTjE ↩
Pritchard, Paul. "A sublinear additive sieve for finding prime number." Association for Computing Machinery, 1981. ↩
"Explaining the wheel sieve." Acta Informatica, 1982. ↩
https://eprints.whiterose.ac.uk/id/eprint/3784/1/runcimanc1.pdf ↩
Colloquially referred to as "the best technological innovation since insulin." ↩
Greek docs aren't inherently poor, but this guy took the historical aspect too far ↩
Tromp references a number of relatively higher quality parsers on his Information Theory Playground ↩
I hear that over in the Rust community, they kill you with rocks if you don't have doc blocks & a comprehensive README ↩

History

How it works

Currying and Left Associativity

Evaluation, α-substitution and β-reduction

α-substitution

β-reduction

Boolean Arithmetic

Church Numerals

Arithmetic

Succ

plus, minus

times

Exponentiation

(in)equality and negative numbers

Recursion and Combinators

Fixed Point Analysis

The YYY combinator

The Turing Fixed Point combinator

Recursion Revisited (ha ha ha)

Other Combinators

Identity / Identitätsfunktion

Constant / Konstanzfunktion

Substitution / Verschmelzungsfunktion

Iota

Composition / Zusammensetzungsfunktion

Swapping

Duplication

Self-application

Divergence (fork bomb)

Strict Fixed Point

Thrush / Vertauschungsfunktion

β-reducability

Reduction Strategies

Other Reduction Strategies

John Tromp

BLC

De Bruijn Indices

Visualizing λ

Elegant λ programs

The Real Sieve of Eratosthenes

Spare Aude: a devlog

God bless Rust

Naive Eratosthenes

Rubber Duckies (aka hostage coworkers)>> Stack Overflow

Footnotes

Footnotes

The $Y$ combinator