Honor Among Sieves

I'm revisiting my implementation of the prime number sieve from this post since I was losing sleep over its sheer size due to the numerical format of its output.

A Preface About Finite State Automata

A mealy machine¹ is a finite state machine (though it doesn't actually have to be infinite if you get clever about positioning of ellipses in your proofs), which differs from the perhaps-conventional understanding of a FSM in that its transition dynamics are determined explicitly by both its inputs and the current state rather than just the current state.

We express a Mealy machine as a tuple $(S, s_0, \Iota, \Omicron, \tau, R)$ ² where:

$S$ is the set of all possible states,
$s_0 \in S$ is the initial state,
$\Iota$ is the set of all possible inputs,
$\Omicron$ the set of all possible outputs,
$\tau$ is the transition function $\tau : S \times \Iota \rightarrow S$ mapping state-input pairs to the successive state,
$R$ is the output ("reward") function $R : S \times \Iota \rightarrow \Omicron$ mapping state-input pairs to the corresponding output.

It proved useful to keep this in the back of my head when mulling over Tromp's sieve.

Digesting Diagrams

On his site, Tromp provides a diagram of the alternate form of the prime number sieve:

which is far more compact and visually pleasing than the behemoth I cooked up:

This is largely because of the nature of our respective sieve's outputs. Whereas Tromp's outputs an infinite list of booleans corresponding to whether or not the positive integer number at that index is prime, mine outputs the primes themselves – and church-encoded numerals (or any reliance thereof) naturally results in much larger diagrams.

Scattered throughout his Algorithmic Information Theory repo are various .lam programs related to prime number sievery, and I believe this is the one which produces an infinite binary stream where a '1' indicates that the number at that index is prime, and a '0' indicates it's composite:

let
	zero  = \x\y.x;
	one   = \x\y.y;
	consz = \xs\p.p zero xs;
	oneS  = \cont\x\xs\p.p x (xs cont);
	zeroS = \cont\x\xs.consz (xs cont);
	fix   = \f.(let x=f x in x);
	ones  = fix (\ones\p.p one ones);
	sieve = fix (\sv\n\x\xs\p.p x (x xs (xs (fix n)) (sv (\r.oneS (n r)))));
  main  = consz (consz ones (sieve zeroS))
in main

This function can be visualized as the above diagram in conjunction with Brauner's illustration tool when translated to the format the Brauner's script is expecting (->s instead of .s, and with curried functions instead of multiply nested \<var> definitions).

However, it appears as though Brauner's tool uses applicative order reduction where reduceOneLmIm presumably means leftmost innermost (i.e. eager) evaluation, which –as detailed in the prelude³ post– is unfit for recursively defined functions depending on the placement of the recursive call.

How, then, did Tromp manage to produce the infinitely expanding list of booleans corresponding to their indices primality? To answer this question, we'll first annotate and trace⁴ his implementation.

Rosetta

Constants

The first few definitions are comparatively straightforward, zero and one correspond to true and false.

Streams, Constructors, and combinators

consz = \xs\p.p zero xs;

consz (read: cons zero) builds a stream node where the head is hardcoded to zero.

Recall that cons is defined as

cons = \x\xs\p.p x xs;

where p is some predicate returning true (fst) or false (snd), allowing us to conditionally select the head or tail of our cons cell. consz, then, simply hardcodes the first argument of cons to be zero.

oneS and zeroS are stream transformers defined as:

oneS  = \cont\x\xs\p.p x (xs cont);
zeroS = \cont\x\xs.consz (xs cont);

oneS takes as arguments:

a function cont to transform (apply to) the tail (xs) of the stream,
a stream of the form x:xs,
and a constructor p such as cons or consz.

zeroS is similar with the exception that it hardcodes the constructor p to be our previously defined consz constructor to prepend a '0' the stream.

These two stream transformers will be used conditionally to keep/eliminate primes/composites from our stream of naturals.

Next we have the non-lambda-pure fix:

fix   = \f.(let x=f x in x);

which is just the syntactically-glucosed Z combinator:

Z = \f -> (\x -> f (\v -> x x v)) (\x -> f (\v -> x x v));

the rub

ones  = fix (\ones\p.p one ones);

creates an infinite stream of one values to be used as seed values to the sieve (assume all numbers are prime until proven guilty of composition, and replaced with a zero via consz).

sieve = fix (\sv\n\x\xs\p.
			p x (
				x xs (xs (fix n))
				(sv (\r.oneS (n r)))
			)
		);

this is the beast.

sv is the recursive self-reference,
n is a function which generates a mask to apply to the stream – akin to the wheel of composite elimination, initially n is zeroS,
x:xs is our output stream, initially set to be 0:0:oneS,
and again p is the constructor: cons or consz.

The body of this function re-constructs the existing stream of naturals after applying the mask produced by n that's been constructed so far, and updating the mask as needed if the current number being processed proves to be prime (thus eliminating all future multiples from consideration).

┌───────────────┐
│ Stream input  │
└──────┬────────┘
       ↓
┌──────┴───────┐
│ Apply mask n │
└──────┬───────┘
       ↓
┌──────┴───────┐
│  Build node  │  (using p)
└──────┬───────┘
       ↓
┌──────┴───────┐
│ Update mask  │ (n → oneS (n r))
└──────┬───────┘
       ↓
┌──────┴───────┐
│ Recurse sv   │ (with updated n)
└──────────────┘

Let's see how that works in practice.

Suppose we've determined that the head is prime – as is the case for the first call to sieve for x = 2. The sieve needs to update the filter n such that all multiples of two are flagged as composite. The body of this function accomplishes that via:

retaining the current value: p x (...)
Applying the current stream to itself and the filter mask: x xs (xs (fix n))
Recursing into the sieve with a new filter function: sv (\r -> oneS (n r))

Every recursion of sv pushes another 1 into the filter, delaying the next 0 that zeroes out a multiple.

E.g. for an initial stream of oneS corresponding to the naturals:

[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...]

at each step, if a number is marked as '1' by the mask produced by n, it's considered prime, and we update the mask such that all of the multiples of that number are marked as composite. This is achieved by prepending a '1' to n (which, recall, is not simply a stream, but rather a stream constructor)⁵ to "count" when the next multiple of x occurs. Prepending/offsetting the mask is accomplished via:

n' = \r -> oneS (n r)

this means "add a '1' immediately before whatever the previous mask-producing-function n said." The first invocation, then, produces a function which masks even numbers since ns initial value is just zeroS, and so

n' = \r -> oneS (n r)
   = \r -> oneS (zeroS r)
   => 1 0 1 0 1 0 1 0 ...

So, when applied to the tail of our stream `xs:

current	↓
index	2	3	4	5	6	7	8	9	10
x:xs	1	1	1	1	1	1	1	1	1
n	1	0	0	0	0	0	0	0	0

the first mask is created, updating n to filter even numbers by prepending a 1 to each entry:

current	↓
index	2	3	4	5	6	7	8	9	10
x:xs	1	1	1	1	1	1	1	1	1
n	1	1	0	1	0	1	0	1	0

now applying n to xs:

current	↓
index	2	3	4	5	6	7	8	9	10
x:xs	1	1	1	1	1	1	1	1	1
n	1	1	0	1	0	1	0	1	0
x:xs'	1	1	0	1	0	1	0	1	0

Then, recursing to the next index, we encounter another x=1 scenario at index 3:

current		↓
index	2	3	4	5	6	7	8	9	10
x:xs	1	1	0	1	0	1	0	1	0
n	1	1	0	1	0	1	0	1	0

We update n by again prepending another oneS before each entry – where "entries" are function compositions, so n (and its successors) can be thought of as groups:

n   = (1 0) (1 0) (1 0) (1 0) (1 0) ...
n'  = 1 (1 0) 1 (1 0) 1 (1 0) 1 (1 0) 1 (1 0) ...
n'' = 1 (1 1 0) 1 (1 1 0) 1 (1 1 0) 1 (1 1 0) 1 (1 1 0) ...
.
.
.

and at time of evaluation, the collection of ns are folded via logical conjunction such that the mask applied at index 3 is the composition of n2 ∘ n3. So conceptually we have

n2      = 1 0 1 0 1 0 1 0 1 ...
n3      = 1 1 0 1 1 0 1 1 0 ...
n2 ∘ n3 = 1 0 0 0 1 0 1 0 0 ...

However, we must note also that each successive iteration of n is offset by the depth of the next multiple of the prime its generated from, so n3 begins at index 6:

	0	1	2	3	4	5	7	8	9	10
n	0	0	0	0	0	0	0	0	0	0
n2	1	1	1	1	0	1	1	0	1	0
n3	1	1	1	1	1	1	1	1	0	1
n2 ∘ n3	1	1	1	1	0	1	1	0	0	0

current		↓
index	2	3	4	5	6	7	8	9	10
x:xs	1	1	0	1	0	1	0	1	0
n	1	1	0	1	0	1	0	0	0
x:xs'	1	1	0	1	0	1	0	0	0

okay swag, and the next prime (5) will improve the mask beginning at index 25, 7 at 49, etc. So we get diminishing returns (cardinally speaking), but returns nonetheless.

Additionally, by accomplishing recursive sievery by pushing offsets of oneS into our mask generator n, we never have to realize the cost of eliminating infinitely many multiples of infinitely many primes, and instead only ever have to evaluate a single function call (the composition of n) on the head of our stream to produce the corresponding output! The worst thing that can happen to Haskell code is that it gets executed.

In Haskell, this algorithm can be more easily read as:

primes = let
	one      = \cont (x:xs) ->   x:cont xs
	zero     = \cont (x:xs) -> '0':cont xs
	succ n   = one . n
	primes n = '1':f (primes sn) where
		f  = sn f
		sn = succ n
	in
		'0':'0':primes zero
main = putStr primes

Diagram

Now that we have an intuition about how these pieces come together to form the sieve, we can identify the component pieces within the complete diagram.

From left to right, we have the first two hardcoded 0s as the monolithic vertical bars.

Then we have two placeholders for the first instance of our mask n, with the value of false immediately to its right:

The gross middle bit is the representation of the body of sieve:

sieve = fix (\sv\n\x\xs\p.
			p x (
				x xs (xs (fix n))
				(sv (\r.oneS (n r)))
			)
		);

Next is the visualization of oneS

On the far right side, we can pick out consz:

let
	zero = \x y -> x;
	one = \x y -> y;
	consz = \xs p -> p zero xs
in
	consz

Tromp's BLC sieve, translated from BLC to normal λ-calculus is:

λa.( λb.b ( b (( λc.c c) ( λc.λd.λe.e ( λf.λg. g)( ( λf.c cf( ( λg.g g) ( λg.f ( g g )) )) ( λf.λg.λh.λi.i g( h ( d f ) )) ))( λc.λd.λe.b ( e c )) ) )) ( λb.λc.c ( λd.λe. d)b)

which color codes to:

and (after a singular normal-order, not applicative order beta reduction) produces an intermediate result:

from which we can start to see some shape emerging with two leading true = composite entries, two additional placeholders for the forthcoming false = prime entries for 2, 3, and the recursive structure on the tail.

Footnotes

Mealy, George H. "A Method for Synthesizing Sequential Circuits." Bell Systems Technical Journal, 1955. ↩
Mealy, an academic, uses indecipherable notation, and the Wikipedia entry also uses its own notation which I didn't care for ↩
ba dum tss ↩
double ba dum tss ↩
↩