Annihilating Hilbert's 2nd Problem
- Published on
- ∘ 75 min read ∘ ––– views
Previous Article
TW: this post contains Haskell slander. Or, rather, slander of Haskellers. As has been established,1 Haskell is divinely inspired and correct, but man is involved,
Who of us would not be glad to lift the veil behind which the futures lies hidden; to cast a glance at the next advances of our science and at the secrets of its development during future centuries?
– David Hilbert
In a previous post, I said that psycho diagrams are usually indicative of something I'm going to be mega interested in: "It's like glimpsing arcana, or an overly complicated diagram of an extinct theology, a crackhead's projection of metaphysical laws scribbled in soot," and this post on the is no exception, behold:
History
In 1902, David Hilbert presented a set of ten problems to the Bulletin of the American Mathematical Society,2 the second of which questioned the existence of a procedure to prove that there is a finite number of logical steps based upon mathematical axioms that can never lead to contradictory results.
In the coming decades, Alan Turing, Kurt Gödel, and Alonzo Church would all independently answer this question (resoundingly, no).
Turing with his invention of the eponymous universal computation machine which gave rise to imperative programming and all of computer science. Gödel with his working pioneering general recursive functions and specifically his second incompleteness theorem which directly addressed Hilbert's second question. And Church who is credited with the specification of the λ-calculus.
How it works
In the λ-calculus, unlike in imperative/Turing/von Neumann programming, there is no distinction between data and functions. Fundamentally, it is just a notation for substitution which gives rise to ~all of computing, for the same reason that Turing Machines are considered universal computation machines.
λ-calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation that can be used to simulate any single-taped Turing machine and was first introduced by mathematician Alonzo Church in the 1930s as part of his research of the foundations of mathematics.
λ-calculus is composed of three objects:
- Terms: – variables, expressions, abstractions, and applications are all considered "terms"
- Abstractions: these are definitions of lambda functions
- Applications: these are the usages of the abstractions with other terms.
A term can be anything – arbitrary variables, function declarations, or invocations are all considered terms. In other words, if is some arbitrary term, then is also a term.
The funky looking thing sharing the parentheses with a is an abstraction which has the general form:
where the first gap holds the variable said to be "bound" by that lambda, and the space on the right of the period delimiter is the body of this function which can contain any other term.
Function application is the process of combining or, y'know, applying terms. All applications are binary, taking two terms. The first term is the function, and the 2nd term is the argument, so the conventional Euclidian function notation would be expressed .
Any combination of these three building blocks results in a valid lambda expression.
Currying and Left Associativity
It's worth noting right away that λ-calculus is Left Associative, meaning:
Because of Left Associativity, we can do away with many of the parentheses and, though all proper lambda expressions are functions of a single variable, we can introduce some syntactic sugar called currying to allow us define multi-variate functions like so:
If we look closely, we can see that our function of two arguments returns a function which returns a single argument. In general, an -argument function spits out a function of arguments.
Evaluation, α-substitution and β-reduction
In the λ-calculus, we use two forms of substitution in order to evaluate our expressions. Recall that λ-calculus is, at its core, just a framework for rules of substitution of some symbols for other symbols.
α-substitution
The first and most rudimentary substitution is aptly named α-substitution which just lets use rewrite any expression comprised of arbitrary symbols with other (unbound) symbols:
α-substitution also yields a notion of α-equivalence, which indicates that two expressions are structurally identical, e.g.
We can further illustrate what is meant by a "bound" variable via:
In this example, the implied α-substitution on is only applied to the bound variable , not the perhaps-poorly-named term that is being passed as the argument to the abstraction.
β-reduction
β-reduction is a strategy for combining application with abstraction, and is the process by which we actually "evaluate" lambda expressions. Consider an expression . Note that is the only variable, and are free terms. We can assume that function body of the abstraction does not contain any nested lambdas which rebind . If they did (unintentionally), we could first apply α-substitution to disambiguate. is the argument that will be substituted for the variable in the body of the abstraction . We don't know what is, since it's not bound by this abstraction, but it's implied that some term called exists.
To apply the expression, we use the application template . The result of this expression is the result of its body but with every instance of the bound variable replaced by the argument to the function which is whatever term immediately follows the abstraction. These steps for substitution define the aforementioned application process.
In our example, would be the argument to the function defined by the abstraction. To apply, we just swap every instance of in the body of our abstraction with our argument , and discard the scaffolding of the abstraction:
A slightly more complicated example:
Though this might seem confusing at first, it's just a pre-fix (🇵🇱) way to express the same concept of functions in standard algebra. E.g.
Though we've yet to define addition, multiplication, exponentiation, or even numbers yet, this example should hopefully add some clarity to function application via β-reduction.
A more complicated example which also employs currying:
Here I've added a lot of training wheel parens which quickly becomes pretty noisy but this is helpful lest we accidentally incorrectly beta reduce like so:
Boolean Arithmetic
With just these three components of terms, abstractions, and application, we can develop boolean arithmetic from which we can also define notions of higher order operations.
We can define as a function of two arguments which which returns the first argument:
which can be applied like so:
Similarly, we can define false as the binary function which selects its second argument:
e.g.
In this way, True and False are just tuple selectors. We can construct a ternary operator like so:
where is our predicate condition, is the case to apply if and the case if .
Boolean arithmetic also motivates negation , a unary3 operator which just maps to and vice versa. We can define it like so:
We can also define logical conjunction and disjunction:
Here, is acting as a selector. If is false, then our whole abstraction needs to be false. Recall that selects the 2nd argument, and our two arguments to are . If is true, then the whole abstraction needs to resolve to iff is also true. This works since will select the first value from , and if is true, then the whole abstraction resolves to true, otherwise it will resolve to false.
Similarly, we can derive logical disjunction via:
Church Numerals
Till now, we haven't even solidified the notion of quantity in the λ-calculus, despite inclusion of arithmetic symbols in a few of the examples earlier. Like any extension from boolean logic to number theory, we'll arbitrarily assign the value of to one of our boolean values.
The astute reader will note that this is equivalent to our definition of which is contrary to most von Neumann conventions where and . This encoding is entirely arbitrary, but standard in this context where we'll use Church encodings. Note, however, that even within the λ-calculus, there are myriad numerical encodings to choose from which have their own uses including: Scott,4 Parigot,5 and Stumpfu.6
Since all data are functions in the λ-calculus, we can define quantity as iterated calls to that function. Sadly, λ-calculus is 1-indexed. If we have one of something, we treat that as zero:
Iterating that value once with some function , we get :
Disregarding the notational abuse of to represent invocations of , we can see that numbers are just functions which take a function and a variable and then iterate the function on that variable.
Arithmetic
With numbers in hand, we can define basic arithmetic.
Succ
The first building block of which is the successor operation which lets us apply some function to an argument , times:
to apply to , times we just apply , and to achieve the th application, we just apply once more:
We can see that the successor of is in fact . We begin with a crucial α-substitution, and proceed with application via β-reduction:
which results in three applications of to which is perhaps the purest interpretation of what it means to be "three."
plus, minus
We can define addition as iterated application of . For example, we can find for some arbitrary via:
so for (arbitrary addition) we just iterate the function:
which is colloquially pronounced "I'm succkin'."
Similarly, we can define subtraction in terms of function given by:
from which we can implement subtraction intuitively as:
times
Multiplication can be achieved via iterated addition:
e.g.
Exponentiation
Exponentiation is actually simpler than multiplication since we're just applying a number to itself times:
(in)equality and negative numbers
And finally, we can define some other useful relations between terms. First we define a "helper" function to determine if a quantity (such as the difference between two numbers) is zero:
Here we can see why we might want to perform some intermediary α-substitutions, but this expression is simple enough/doesn't involve so many nested lambdas referencing the same variable that we can't keep the scope of the referenced value unambiguous. Nevertheless, armed with we'll define less-than-or-equal-to as:
Here, if not sooner, we might wonder "what about negative numbers?" The Church Encoding of the natural numbers thus far has not allowed for negative numbers, so –as presently defined– is actually still just . In order to model the slew of positive and negative integers, we need to introduce a layer of abstraction similar to how we represent complex numbers as for reasons that will become clear momentarily. First, we define a as:
where are our tuple, and is the selector allowing us to index either the or element, where , are aliases for and . Next, we redefine any quantity we might want to model which could be negative as where and . A negative number, then is just the pair with the order of its elements swapped:
E.g. we might represent as the pair s.t. . Thus, is just . Similarly, we can model the complex numbers (after we've modeled the reals (after we've modeled the rationals)) as pairs where:
Encoding reals is a lot of work and you do not want to actually do it in the λ-calculus. But see for example the
etc/haskell
subdirectory of Marshall for a simple implementation of reals in pure Haskell. This could in principle be translated to the pure λ-calculus.7
Back to equality, we can just check that and :
All together, here's a tabular reference of the operations we've defined thus far:
term | abstraction |
---|---|
Recursion and Combinators
What about recursion functions like the fibonacci sequence, or even just factorial? For the latter, we might try to construct a definition like so:
but right away we run into a problem. Within the body of this function, we don't know what is, and since lambda expressions must be self-contained, we cannot have a recursive reference to ourself. Here, we're using more syntactic sugar to assign this whole expression to the symbol , but even this isn't defined in pure λ-calculus, and doing away with our useful symbol assignment, we quickly run into the problem of infinitely unrolling our expression deeper into itself. Reverse vore, if you will, horrifying!
However, per the Church-Turing thesis, we know that any non-halting problem can be computed via a Turing machine and must also be representable in the λ-calculus, so how would we represent a halting computation such as factorial?
Fixed Point Analysis
In standard algebra, a fixed point of an arbitrary function is anywhere where the function intersects . In other words, where . For example, has a singular fixed point at :
has two fixed points at and a function like has no fixed points. In the λ-calculus, we can use this notation in conjunction with our rules of substitution which define function application to delay the application of an otherwise-recursive function call within another function.
For example, if we have some term , we can equivalently express it as:
such that the original function becomes the argument to the abstraction which then evaluates to . By using in place of in a recursive function, we can substitute the recursive call with our entirely new function which does not recurse, but instead applies the recursive ~effect
at the appropriate time during function application. There is, of course, a general form for this process. Several, actually.
combinator
TheThe first of note being the combinator which is a function that returns the fixed points for any input function:
and we can iterate the combinator as well to achieve any amount of nested "recursion":
The Turing Fixed Point combinator
Another useful combinator which achieves a nearly identical effect is Turing's fixed point combinator denoted . Consider the helper function:
The Turing FPC is defined as:
Let's see it in action for some arbitrary function we want to find the fixed point of:
So, is a fixed point of . But wait, does this mean we've solved the Reimann hypothesis, since we could use this to find all fixed points of the Reimann Zeta function which is isomorphic to a solution to the open problem?
Not quite... Despite the fact that –unlike functions on the real numbers– not only does every function in the λ-calculus have multiple strategies for finding a guaranteed fixed point, every function has a trivial fixed point, these fixed points can be arbitrary lambda expression with no numerical interpretation from which we could recover a real valued quantity.
For example, while applying to does yield an expression who's fixed points is itself a valid lambda expression, it's wholly meaningless spaghetti outside of the context of indirecting recursion.
Recursion Revisited (ha ha ha)
Returning to our stubbed definition of , we now substitute the problematic recursion call with some placeholder function and consider the resultant fixed point of this function:
boom! There's no recursion here since is finite in closed form, but via β-reduction we've derived the equivalent recursive characteristic function.
Other Combinators
Many of these combinators get their names from German Mathematician Moses Schönfinkel.8 Not all combinators are used for recursive effect – in general, a combinator is just a closed lambda expression meaning it has no free variables.
Identity / Identitätsfunktion
The simplest combinator is an abstraction which just returns its argument:
Constant / Konstanzfunktion
Synonymous with and/or , this is also known as the discarding combinator:
Substitution / Verschmelzungsfunktion
This combinator perhaps gets its name from the role it plays when converting from lambda expressions to combinatory logic statements.9 10 Consider the expression:
If we wanted to find the equivalent SKI-combinator for the application of this expression with some argument , the argument would substitute it into and in place of the free variable , the apply the result to the result . However, in combinatory logic, there are no parameter variables, so in place of abstraction, we need a combinator which performs an analogous substitution operation on its arguments. This combinator takes and an argument and perform the combinatorial analogue of substitution before applying to e.g.
Alternatively, if we have an expression with a repeated subexpression, we can first apply and to move the duplicated subexpression into the appropriate position s.t. the result has the form:
which we can then Schmelzen or fuse together via the combinator:
The combination of these combinator give rise to the SKI calculus11 which alone is Turing Complete.
Iota
The universal combinator
Composition / Zusammensetzungsfunktion
Swapping
Duplication
Self-application
Divergence (fork bomb)
Strict Fixed Point
This is another "recursive" combinator. It differs from e.g. the combinator insofar as it will it will work with all the reduction orders suitable for its lazy counterpart (the combinator). In addition, it will also work with CBV
(call-by-value) and HAP
(hybrid applicative) reduction orders, though it is not a drop-in replacement; in order for such expressions to work, they need to be modified so that the evaluation of arguments of conditionals and other terms that need to be lazy is delayed.
We say that it is strict w.r.t. the combinator, since the combinator defers evaluation until a reducible expression is at the "head" of statement. Therefore, it is suitable for NOR
(normal), HNO
(hybrid normal), CBN
(call-by-name), and HSP
(head spine) reduction orders, but unsuitable for eager/applicative evaluation orders.
Thrush / Vertauschungsfunktion
Reverse application
β-reducability
It's worth noting here that an expression which can't be β-reduced any further is said to be in β-normal form. Additionally, throughout the process of β-reduction, there are oftentimes multiple choices of which term to reduce next (as hinted by the preceding discussion of evaluation orders).
For example, here's the reduction graph for which requires 8 β-reductions:
However, at node 1, the choice of reduction of either or is arbitrary, so here our graph forks:
And it expands once more before converging again on the resultant term:
This begs the question about which kinds of expressions are reducible, or have a β-normal form. There do exist a number of irreducible expressions such as the omega fork bomb:
Or it's infinitely-expanding cousin :
This begs the question, does the order of operation of β-reduction matter? For a function like , the graph is convergent for the first few hundred β-reductions, but then it forks into two branches, one of which is infinitely divergent since it attempts to "unroll" the recursive call via the combinator rather than leverage the base case. However, the Church-Rosser theorem states that the order does not matter, and that at any proverbial fork in the road, if there are two distinct reductions or sequences of reductions that can be applied to the same term, then there exists a term that is reachable from both results.12 13
Reduction Strategies
Whether or not our reduction traverses one of these intermediate states is determined by the β-reduction strategy we employ. There are a number of well-researched reduction strategies which lend themselves to different types of expressions.
Thus far, we've mainly been using normal order reduction, which reduces the leftmost, outermost β-redex first before proceeding to the sub-expressions contained within, or following the left-most expression. This effectively results in deferring the evaluation of the arguments to a function until the last possible moment.
The obvious counterpart to normal order reduction is called applicative order which prioritizes the leftmost innermost expression. Applicative order is sometimes referred to a eager evaluation, which may be unfit for unrolling some expressions as we saw with our example.
Consider the following expression under the two reduction orders:
Under normal order () reduction the abstraction binding will replace both occurrences of in the applications in the first abstraction before evaluating the expressions:
Note that normal order reduction requires evaluation of the abstraction twice before getting the final answer. Applicative order () reduction on the other hand will first reduce the expression to before substituting it into :
While applicative order seems like the clear winner in this example, it's trivial to construct a counter example (any recursive function) for which applicative order reduction will not only lose to normal order reduction, but may fail reduce to a β-normal form entirely, e.g. this expression nicely reduces via normal order:
whereas applicative order will attempt to unroll the fork bomb before exploiting the fact that the outermost discards its argument entirely.
Other Reduction Strategies
Along with normal and applicative order, other common strategies include:
- Head Spine (HSP): leftmost outermost, abstractions only reduced when in the head position,
- Hybrid Normal (HNO): a mix between Head Spine and Normal,
- Call by Value (CBV): leftmost innermost, no reductions inside abstractions,
- Hybrid Applicative (HAP): a mix between Call by Value and Applicative order
- Call by Name (CBN): leftmost outermost, no reductions inside abstractions
These seven reduction strategies produce reductions which fall into one of four normal forms: NF (top left), Weak NF (top right), Head NF (bottom left), WHNF (bottom right)14
reduce args | reduce under abstraction | |
yes | no | |
yes | APP, NOR, HAP, HNO | CBV |
no | HSP | CBN |
We can further categorize reduction strategies as pure or uniform (those which only involve that reduction strategy itself) vs. hybrid (those which use a uniform strategy for the reduction of an abstraction in an application ).
hybrid | uniform |
---|---|
NOR | CBN |
HAP | CBV |
HNO | HSP |
John Tromp
Now, this would be a sane place to end the post, but right around this point in researching for this post and was looking into "elegant" programs implemented in the λ-calculus I found John Tromp. Where to begin with this guy...
Gregory Chaitin paraphrases John McCarthy about his invention of LISP, as "This is a better universal Turing machine. Let’s do recursive function theory that way!" Chaitin continues "And the funny thing is that nobody except me has really I think taken that seriously."
He was the first to strongly solve Connect 4 (with four-and-a-half years of computation time)15 which warrants its own post in the computational game theory series, but he's also done a shitton of similarly SIGBOVIK-esque work in the λ-calculus.
BLC
The λ-calculus is already the most minimal formalization of ~all of computation, but Tromp took it a step further by encoding all of it ("all" is a stretch, since we know it consists of only like 3 things) in binary which constitutes one of the cooler esolangs17 I've ever encountered.
Starting from first principles, we define:
Inputs and outputs to a BLC program are lists built from which is just an object holding two other values:
We can use to cons-truct a -terminated list like so:
with this definition of , we can leverage other methods to comprehend our lists like so:
suffice it to say, we can build lists and use them. For a finite, binary string and a lambda term , we denote the list to be the -terminated list of its bits (i.e. ):
A λ-machine is a lambda term applied to an input binary stream, and the normalized result of this application is the output. E.g. for the identity function , and input , a λ-machine yields:
indicating that:
So, we can translate bit strings to lambda expressions, big whoop. What about the inverse: translating a lambda expression back into a bit string. We'll denote this process as such that a Universal Turing Machine could operate according to
The question is then: how to concisely encode the necessary, though few, semantics of the λ-calculus: abstraction, application, and variables.
De Bruijn Indices
Named for the Dutch mathematician,18 De Bruijn indices define a name-free notation for variables where variable symbols are instead substituted with the number of nested lambdas up to its binding λ. E.g.
To encode a lambda term in BLC, we express it via DB indices, and insert explicit application operators which are necessary to disambiguate applications from the stream of 1s and 0s which ought to be interpreted as functions/data. So the above expression would be "prepared" as:
Finally, we encode each element of this modified λ-calculus in binary as:
so, the result of would be:
The self-interpreter for BLC19 in pre-BLC is given by
which satisfies –for all closed terms –
and is a mere 206 bits long.
A Universal Turing Machine in BLC can be expressed as:
Visualizing λ
But he didn't stop there, oh no no no dear reader. This boy has got the tism and I'm buying what he's selling. Tromp devised a means of visualizing lambda20 expressions where:
- Abstractions are horizontal bars
- Variables are vertical bars spawning from the lambda they're bound to
- Applications are expressed by the branching structure connecting abstractions and variables
We'll begin by stepping through the fundaments and then applying Tromp's rules for lambda diagrams to some of the more complex expressions we've derived so far.
Furthermore, we can trace the application of more structured –though yet unreduced– lambda expressions. For example, the arbitrary function reduction:
(which can't be β-reduced further without additional terms, since this expression effectively selects the third element of a triple, and without knowing anything else about the free term , we can't simplify any further). Nevertheless, this reduction can be illustrated as:
A less-abstract21 example might be the β-reduction of logical negation which, without any arguments, resembles:
An pseudo-arithmetic operation like has the argument-less structure:
and for some (sanely small) , say , we can pipe in an argument:
which is three! (not factorial, I'm just excited). Speaking of factorial, though, here's that:
Elegant λ programs
The problem I'm most-interested in which seems least-insane to attempt to implement in pure lambda calculus and then attempt to diagram is the Sieve of Eratosthenes. Here's about where I would typically outsource an explanation to/from Wikipedia, or some other contextually relevant source, such as, idk, the Haskell website which currently has:
primes = filterPrime [2..] where
filterPrime (p:xs) =
p : filterPrime [x | x <- xs, x `mod` p /= 0]
on the masthead as a sample of Haskell's syntax, but once upon a time (2015) was instead expressed:
primes = sieve [2..]
where sieve (p:xs) =
p : sieve [x | x <- xs, x `mod` p /= 0]
which incorrectly!!!! implies that the technique being implemented was a sieve, which sparked much derision23 amongst the most insufferable demographic of programmers: FP Haskellers.
So, we will briefly24 take a detour to completely understand the algorithm, it's purely functional implementations (runtime complexity be damned), and pure-optimizations that we can make before taking a stab at implementing it ourself, and building some tooling along the way to help keep me sane.
The Real Sieve of Eratosthenes
This section is largely informed by Melissa O'Neill's paper entitled The Genuine Sieve of Eratosthenes25 which lampoons false sievery and discusses a number of not-quite-strictly λ-pure optimization (by which I mean: use of data structures that I cbf to implement and retain a legible diagram for).
The sieve function –as described by the 2nd century B.C. Greek mathematician– is as follows:
- Begin with a collection of numbers-to-be-sieved, sorted in ascending order e.g. ,
- Starting with the first number , declare it to be prime, and eliminate all multiples of that number in our list, starting from (e.g. would be removed)
- Set to be the next non-eliminated number after and repeat.
For a fixed-size list, this algorithm terminates at the th entry. This process crucially differs from the naïve trial division algorithm claiming prime real estate over on Haskell dot org since it does not employ any notion of division. While trial division is asymptotically superior to Eratosthenes' methodology,
we can cut him some slack since the notion of like ~a million probably would've knocked his sandals off. Additionally, we can make some slight tweaks to our Grecco-faithful implementation which, without changing the functional affect of any of the steps, can save us a number of operations proportional to the input size (which will translate to –I shit you not– billions of β-reductions when computing more than a few dozen primes).
Whereas the original algorithm crosses off all multiples of a prime at once, we perform these "crossings off" in a lazier way: crossing off just-in-time. For this purpose, we will store a table in which, for each prime that we have discovered so far, there is an "iterator" holding the next multiple of to cross off. Thus, instead of crossing off all the multiples of, say, , at once (impossible, since there are infinitely many for our limit-free algorithm), we will store the first one (at ; i.e., ) in our table of upcoming composite numbers. When we come to consider whether is prime, we will check our composites table and discover that it is a known composite with as a factor, remove from the table, and insert (i.e., ). In essence, we are storing [JIT] "iterators" in a table keyed by the current value of each iterator.
O'Neil leverages Haskell's Data.Map
to implement this approach:
sieve xs = sieve' xs Map.empty
where
sieve' [] table = []
sieve' (x:xs) table =
case Map.lookup x table of
Nothing -> x : sieve' xs (Map.insert (x*x) [x] table)
Just facts -> sieve' xs (foldl reinsert (Map.delete x table) facts)
where
reinsert table prime = Map.insertWith (++) (x+prime) [prime] table
which is where is length of a fixed input, which is actually more better gooder than the time complexity of the trial division algorithm: . Also I love that Professor O'Neil accounts for non-unit cost of the arithmetic division operations utilized by unfaithful sieves which would offset the cost of using a tree data structure. She continues to improve the performance of this approach by swapping out the native Data.Map
for a bespoke PriorityQueue
which is clever since the proposed algorithm only ever needs to check the least element of the collection i.e. the head.
She supposes that, given the existence of a PriorityQueue
with the following API
empty :: PriorityQueue k v
minKey :: PriorityQueue k v -> k
minKeyValue :: PriorityQueue k v -> (k,v)
insert :: Ord k => k -> v -> PriorityQueue k v -> PriorityQueue k v
deleteMinAndInsert :: Ord k => k -> v -> PriorityQueue k v -> PriorityQueue k v
we can milk Eratosthenes a bit more with minor adjustments:
sieve [] = []
sieve (x:xs) = x : sieve' xs (insertprime x xs PriorityQueue.empty)
where
insertprime p xs table = PriorityQueue.insert (p*p) (map (* p) xs) table
sieve' [] table = []
sieve' (x:xs) table
| nextComposite <= x = sieve' xs (adjust table)
| otherwise = x : sieve' xs (insertprime x xs table)
where
nextComposite = PriorityQueue.minKey table
adjust table
| n <= x = adjust (PriorityQueue.deleteMinAndInsert n' ns table)
| otherwise = table
where
(n, n':ns) = PriorityQueue.minKeyValue table
This approach also cleanly lends itself lazily incrementing by factors of , rather than (which would require redundant evaluation of even numbers) whereas imperative approaches would require additional asymptotically non-trivial amounts of and operations.
O'Neil notes that this simple elimination of even numbers improves performance by a factor of ! (still not factorial, sorry), and that we can eek out some further performance boosts by skipping the about 3/4s of the remaining composites which are divisible by and/or .
As we saw above, to produce numbers that are not multiples of , we simply begin at and then keep adding . To avoid multiples of both and , we can begin at and alternately add then . We can visualize this technique as a wheel of circumference 6 with holes at a distance of and rolling up the number line. In general, adding an additional prime p to the wheel multiplies the circumference of the wheel by , and removes every th composite. Thus, there are usually diminishing returns for large wheel sizes—our wheel for the first four primes has circumference (i.e., ) with 48 holes, whereas the wheel for eight primes has circumference 9,699,690 and 1,658,880 holes, but eliminates fewer than 7% of the remaining composites.
A small wheel can be implemented as:
wheel2357 = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2:4:8 :6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wheel2357
spin (x:xs) n = n : spin xs (n + x)
primes = 2 : 3 : 5 : 7 : sieve (spin wheel2357 11)
yielding the following performance gains:
And, having gracefully dunked on The Sleight on Eratosthenes, O'Neil concludes with a list-based faithful implementation of the Sieve shared by a friend and fellow prime-enthusiast Paul Pritchard (and nicely animated here26 by the chad himself) which I can bf to translate from Haskell to the λ-calculus:
primes = 2:([3..] ‘minus‘ composites)
where composites = union [multiples p | p <- primes]
multiples n = map (n*) [n..]
(x:xs) ‘minus‘ (y:ys)
| x < y = x:(xs ‘minus‘ (y:ys))
| x == y = xs ‘minus‘ ys
| x > y = (x:xs) ‘minus‘ ys
union = foldr merge []
where
merge (x:xs) ys = x:merge' xs ys
merge’ (x:xs) (y:ys)
| x < y = x:merge' xs (y:ys)
| x == y = x:merge' xs ys
| x > y = y:merge' (x:xs) ys
which makes careful use of laziness since taking the union of the infinite list of infinite lists:
is tricky unless we exploit the fact that the first element of the result is the first element of the first element of the first infinite list, hence the mindful definition of union
. This implementation has a time complexity of which is worse than trial division by a small factor of , but past this point –as we veer into incoming λ-traffic– we don't particularly care anymore. Pritchard has numerous publications regarding sub-linear faithful sieves.27,28
Anyways, O'Neil leaves a wheel + list-based implementation of Pritchard's Sieve as an exercise to the reader.29
you psycho.
Spare Aude: a devlog
With an end goal of producing the Tromp diagram for Pritchard's list-based implementation of the faithful Sieve (and then, depending on how many white hairs I have after the fact, adding a wheel to the mix), let us first begin with the diagramming codebase itself.
Paul Brauner graciously wrote this Haskell script to parse (custom) .lam
files and translate them into an animation of the β-reduction to normal form (if any exists. If it doesn't exist, good luck recapturing that disk space bc this bad boi is gonna try).
Setting up this repo alone was a trial-and-half since its last commit was May 2018. Notably, the first commercially available Apple Silicon chips30 weren't released until November 2020. Unsurprisingly, the version of Stack, Cabal, and GHC Paul used are incompatible with the arm64
instruction set, and are no longer receiving LTS. Upgrading to the oldest arm-compatible release of each of the requisite build tools breaks a gazillion dependencies that I –already woefully unfamiliar with Haskell's ecosystem– am unwilling to unwind; it ended up being easier to blow the dust off ye' old gaming box than sort out virtualization compatibility or containerization (alpine
and Virtual Box merely added layers of indirection before running into the underlying issue that my arm64
CPU simply cannot produce x86_64 instruction – whoda thunk it).
After getting the visualization repo setup, the next task was to translate Pritchard's Haskell-ish pseudo code into the .lam
format which Paul's program was expecting. His program is great if your lambda expressions are valid, but less helpful if you are prone to introducing errata in the sea of parens and recursion necessary to implement the sieve as described. To draft an implementation, I turned first to a number of browser based λ-calculus interpreters/parsers rather than fight more uphill battles of compatibility/tooling/poorly documented31 toy repos, but I quickly ran into a problem.
At this point, I understood that even "trivial" computations like could unfurl into thousands of costly β-reductions, but I (clueless) had (naïve) faith in my browser's ability to help validate my implementations of even the basic requisite functions. https://lambster.dev/ in particular was "a pleasure" to use, and https://www.allisons.org/ll/FP/Lambda/ was surprisingly robust (the absence of any styling/navigation assistance on this site lmk that I was working with a real FPer), however neither was robust enough.
Alas, I resumed skimming repos, looking for "C/C++" or anything else that screamed "speed" with as my benchmark.32 After more determining that most of the "optimized" or "memory efficient" parsers were painfully old/incompatible, or just outright broken (back in my day, you got an F if your program segfaulted) even if I did manage to get them to build, I stumbled into the light.
God bless Rust
https://github.com/orsinium-labs/rlci fire. Go look at it and tell me it's not fire.
Splendid for handling , but 1) I dislike it's syntax choices (significant whitespace – because, sans any other syntactic sugar like currying, we can actually do away with the periods all together) and 2) I cbf to inject my own analytics/debugging crap into its AST parser. However, having tasted the splendor of a well-maintained codebase,33 I refined my search for a more hands on λ-calculus Rust tool.
https://github.com/ljedrz/lambda_calculus is (almost) everything I wanted.
Naive Eratosthenes
For our sieve implementation, we'll need:
primes = Y
(λself.
λf.cons two (
minus (from three) (union (map multiples self))
)
)
And, quite frankly, the further I got, the more I realized I might have bit off more than I could chew. While ljderz provided a host of user-friendly APIs in his lib, it still required a number of modifications/additions before we could just drop in this algorithm. My (admittedly messy) fork of the repo with these additions can be found here. I have no ambitions of getting it merged, I have accepted my station in life as a python-peon.
So I found myself with what I thought was a faithful sieve implementation, but my program continued to overflow the stack when attempting to reproduce even a trivial amount of primes e.g. – which is about when the white hairs started to pop up.
Rubber Duckies (aka hostage coworkers)>> Stack Overflow
[peter.murphy] So my sieve works for a finite amount of primes
I found the issue
It’s the reduction strategy
(because there’s several to choose from)
And because of the nested recursive calls (to
union
andmerge
andprimes
itself), I would have to define a custom reduction strategye.g. doing it by hand it works, however, to get even just 2,3,5,7 it takes 1.1mil reductions
[will.hombre] There isn't a "Don't be stupid" strategy built-in?
[peter.murphy] Not really… like normal order reduction is the laziest, which is usually safe
except when you reduce to a point where a recursive call is in the “head” position, but it would be more beneficial to unroll the argument to the head instead of the head itself
which I have at least one instance of:
multiples
andprimes
both spawn infinite lists on the same “line”
[O'Neil, citing Bird]
This code makes careful use of laziness. In particular, Bird remarks that “Taking the union of the infinite list of infinite lists [[4,6,8,10,..], [9,12,15,18..], [25,30,35,40,...],...] is tricky unless we exploit the fact that the first element of the result is the first element of the first infinite list. That is why union is defined in the way it is in order to be a productive function.”25
[peter.murphy] and while I get why this highlighted part is crucial and should lend itself to normal order reduction, I think you really want to unroll “one step” of multiples and primes in lockstep at a time
BFS the redux tree so to speak
but then at the same time, you need to prioritize a
take n
which will be in the head position, and so the trick is to get like 5 “steps” intomultiples
andprimes
each (exactly like we had on the whiteboard) to give you enough of a stream totake
fromand there’s no general reduction order for this hyper specific case lol
So, finally we arrive at an unfortunately finite example:
use lambda_calculus::{
data::{
list::pair::{cons, from, map, minus, take, union},
num::church::mul,
},
*,
};
fn main() {
let multiples = abs({
app!(
map(),
abs!(1, app!(mul(), Var(2), Var(1))), // n * x
from(Var(1)) // Generate the list starting from n
)
});
// [2, 3, 5]
let finite_primes = vec![
2.into_church(),
3.into_church(),
5.into_church()
].into_pair_list();
// builds an (infinite) list of infinite lists: [[4, 6, 8, ...], [9, 12, 15, ...], [25, 30, 35, ...)
let finite_composite_stream = app!(
union(),
app!(
map(),
app(take(), 4.into_church()),
// Apply map to multiples for the current prime
app!(map(), multiples.clone(), finite_primes.clone())
)
);
// Generate the next prime candidates from the natural numbers
let natural_numbers = app(app(take(), 9.into_church()), from(3.into_church()));
// Subtract composites from natural numbers to get the primes
let rest = app!(
minus(),
natural_numbers.clone(),
finite_composite_stream.clone()
);
// Return the result by appending the base case (2) with the remaining primes
let mut result = app!(cons(), 2.into_church(), rest.clone());
println!("\n\n{}: {}", beta(result, NOR, 0));
}
which produces the satisfactory output (after nearly a million reductions):
λa.a
(λb.λc.b (b c)) // 2
(λb.b (λc.λd.c (c (c d))) // 2
(λc.c (λd.λe.d (d (d (d (d e))))) // 5
(λd.d (λe.λf.e (e (e (e (e (e (e f))))))) // 7
(λe.e (λf.λg.f (f (f (f (f (f (f (f (f (f (f g))))))))))) // 11 (λf.λg.g)))))
We can compare the efficiency of the various reduction strategies:
and observe that there's little noticeable difference expression size or number of reductions required to obtain a β-normal form for our finite expression. Pure applicative form stack overflows (unsurprisingly) somewhere between 3,000 and 4,000 β-reductions, with an expression size of 43,857 chars, and Hybrid Applicative Form goes foom after 11,000 reductions:
Having verified with a finite example, I am satisfied to plug in the infinite example into the diagramming application.
So now I just had to translate my working implementation of Pritchard's Sieve back into his bespoke .lam
format to get a diagram.
Here it is:
let
-- booleans
true = \a b -> a;
false = \a b -> b;
and = \p q ->p q p;
-- arithmetic
mul = \m n f -> m (n f);
is_zero = \n -> n (\x -> false);
pred = \n f x -> n (\g h -> h (g f)) (\u -> x) (\u -> u);
succ = \n f x -> f (n f x);
sub = \m n -> n pred m;
leq = \m n -> is_zero (sub m n);
eq = \m n -> and (leq m n) (leq n m);
-- combinators
Z = \f -> (\x -> f (\v -> x x v)) (\x -> f (\v -> x x v));
I = \x -> x;
-- list functions
nil = false;
is_nil = \l -> l (\h t d -> false) true;
cons = \x y z -> z x y;
head = \p -> p true;
tail = \p -> p false;
take = Z (\z n l -> is_nil l (\x -> nil) (\x -> is_zero n nil (cons (head l) (z (pred n) (tail l)))) I);
from = Z (\z n -> cons n (z (succ n)));
map = Z (\z f l -> is_nil l (\x -> nil) (\x -> cons (f (head l)) (z f (tail l))) I);
foldr = \f a l -> Z (\z t -> is_nil t (\x -> a) (\x -> f (head t) (z (tail t))) I) l;
merge = Z (\z xs ys -> is_nil xs ys (is_nil ys xs (leq (head xs) (head ys) (cons (head xs) (z (tail xs) ys)) (cons (head ys) (z xs (tail ys))))));
union = foldr merge nil;
minus = Z (\z xs ys -> (is_nil xs) nil (is_nil ys) xs leq (head xs) (head ys) (eq (head xs) (head ys)) (z (tail xs) (tail ys)) (cons (head xs) (z (tail xs) ys)) (z xs (tail ys)));
multiples = \n -> map (\x -> mul n x) (from n);
-- finite
finite_primes = cons 2 (cons 3 (cons 5 nil));
composite_stream = union (map (take 4)) (map multiples finite_primes);
naturals = (take 9) (from 3);
rest = minus naturals composite_stream;
result = cons 2 rest;
-- infinite variation
primes = Z (\z -> cons 2 (minus (from 3) (union map mulitples z)));
main = result -- N.B. no semicolon on last line
in
main
And lo'
which is... fine I guess.
Footnotes
Footnotes
Hilbert, David. "Mathematische Probleme." Vortrag, gehalten auf dem internationalen Mathematike-Congress zu Paris 1900, Gött. Nachr. 1900, 253-297, ↩
redundant to say this in the lambda calculus since all functions are of one variable ↩
Martín Abadi et al. "Types for Scott Numerals." lucacardelli, 1993. ↩
Aaron Stump et al. "Lambda encodings in type theory." University of Iowa, 2014. ↩
Stump, Aaron. "Efficiency of Lambda-Encodings in Total Type Theory." University of Iowa, 2016. ↩
Andrej Bauer, CS StackExchange, 2012. ↩
Schönfinkel, Moses. "On the building blocks of mathematical logic." WolframAlpha, 1924. ↩
MJD, CS StackExchange, 2021. ↩
"Where Did Combinators Come From? Hunting the Story of Moses Schönfinkel." WolframAlpha, 2020. ↩
Keenan, David. "To Dissect a Mockingbird: A Graphical Notation for the Lambda Calculus with Animated Reduction." dkeenan, 1996. ↩
Church, Alonzo and J.B. Rosser. "Some Properties of Conversion*." Princeton, 1936. ↩
"COMS W3261 – Lecture 24: The Lambda Calculus II." Columbia. ↩
Sestoff, Peter. "Demonstrating Lambda Calculus Reduction." University of Copenhagen. ↩
Tromp, John. "John's Connect Four Playground" tromp.github.io. ↩
"Binary lambda calculus." esolongs.org. ↩
de Bruijn, N.G. "Lambda Calculus Notation with Nameless Dummies." Indagationes Math. ↩
Tromp, John. "Functional Bits: Lambda Calculus based Algorithmic Information Theory." tromp.github.io, 2023. ↩
Tromp, John. "Lambda Diagrams." tromp.github.io, 2023. ↩
joker.gif, ibid. ↩
Kim-Ee Yeoh. "[Haskell-cafe] Prime sieve and Haskell demo." mail.haskell.org, 2015. ↩
lol ↩
O'Neil, Melissa. "The Genuine Sieve of Eratosthenes." Harvey Mudd College. ↩ ↩2
Pritchard, Paul. "A sublinear additive sieve for finding prime number." Association for Computing Machinery, 1981. ↩
"Explaining the wheel sieve." Acta Informatica, 1982. ↩
https://eprints.whiterose.ac.uk/id/eprint/3784/1/runcimanc1.pdf ↩
Colloquially referred to as "the best technological innovation since insulin." ↩
Greek docs aren't inherently poor, but this guy took the historical aspect too far ↩
Tromp references a number of relatively higher quality parsers on his Information Theory Playground ↩
I hear that over in the Rust community, they kill you with rocks if you don't have doc blocks & a comprehensive README ↩