Reflection Without Remorse

Published on
25 min read––– views

Why is it there? asked Rosa.

It occurred to me all of a sudden, said Amalfitano, it’s a Duchamp idea, leaving a geometry book hanging exposed to the elements to see if it learns something about real life.

You’re going to destroy it, said Rosa.

Not me, said Amalfitano, nature.

You’re getting crazier every day, you know, said Rosa.

Amalfitano smiled.1

Introduction

The subject of this paper is some functional programming voodoo and complexity analysis.

That's NOT fascinating you, nerd

The paper is called Reflection without Remorse by Atze van der Ploeg and Oleg Kiselyov. The authors introduce the problem of left-associative monadic binds performing asymptotically worse than their symmetric counterparts. They acknowledge that there are existing (unsatisfactory & inflexible) workarounds, and present some clever, more-robust solutions using existing data structures.

The problem introduced is pretty straightforward, even if it's not something you've consciously run into before, it's something that made sense to me as a problem. The solution as presented in the paper was a bit heavy on the Haskell, so I wanted to take a stab at explaining it back to myself here.

The Problematic Pattern

Given a recursive data type with an associative, binary operation \oplus (e.g. arithmetic addition) where

O(xy)=xx+yxy\begin{aligned} O(x \oplus y) &= |x| \\ |x| + |y| &\geq |x \oplus y| \end{aligned}

we run into the issue where

O((xy)z)>O(x(yz))O((x \oplus y) \oplus z) > O(x \oplus (y \oplus z))

even though these are semantically equivalent, the left-associative grouping performs worse due to asymptotic overhead incurred by the recursive structure of our data type which requires us to traverse the entirety of the data structure before reaching the end (the canonical example is appending to a linked list).

The authors start with some concrete examples (aforementioned list concatenation & tree substitution), but later indicate that the problematic pattern is more pervasive than just sequential collection joinery (which ends up being a good thing, since that implies that the solution to this pattern is equally pervasive).

ADTs

Before tracing the examples, let's first take a step back to understand what is meant by the terms Algebraic and Recursive Data Type.

In the paper, the authors use Haskell. I'm not proficient in Haskell, but I know enough to recognize that Haskell is correct. I/humans tend to be wrong frequently and expect the systems that we've constructed to be wrong in ways that align with our maladjusted mental models (we call this "correct behavior"). However, when we are wrong w.r.t. to Haskell, Haskell is right. Haskell is divinely inspired, so it makes sense that I can't hardly read it and that the hoogle docs burn my skin.

Data Types

We can define a concrete type using the data keyword like so

data Point = Point Integer Integer
--  (1)      (2)        (3)

providing a name for the type (1), a data constructor (2), and the comprising types (3).

note that a Point defined like this will always consist of two Integers – and we'd need a different data type entirely if we wanted to construct a Point out of, say, Doubles.

Algebraic Data Types

It's just about always more useful to abstract over commonalities, so we might want to broaden our definition of a Point. It'll still be a Product of two things, we just needn't care about the type of things:

data Point a = Point a a

a is the type parameter for our Point. Additionally, we might want to put some constraints on a if we really want it to preserve numeric properties s.t. it still behaves like a Cartesian product of e.g. (x, y) which we might want to plot or something.

In Scala, we'd put the type constraint in the definition of our trait/class/whatever, or in the type declaration itself like so:

trait Point[A <: Numeric](x: A, y: A) {
    def add(other: Point): Point
}

but in Haskell, the convention is to put the constraints on the things that use the types, e.g.

addPoints :: Num a => Point a -> Point a -> Point a

We can also define alternative data constructors for our type like so:

data <type> = <subtype/constructor> <arg type>
            | <alternatives>

for example, from some Haskell wiki that looks trustworthy,

data Maybe a = Just a
             | Nothing

which makes sense. We can have a Maybe of anything, regardless of the type of thing. We can algebraically abstract that thing's type away and just replace it with a variable a, and we can also have another valid instance of a Maybe that is the absence of the thing a.

This is a the bare minimum explanation of both ADTs and Haskell that we need to advance to the main topic, a better explanation is outside the scope of this post.2

Recursive Data Types

A Recursive Data Type is one that is self-referentially defined. Read more about that here.

A classic example is that of a tree. In Haskell, we might recursively define a tree in terms of Node and Tree types since every element of a tree is either a leaf node with no children, or an internal node with one or more children:

data Tree = Node Tree Tree
          | Leaf

Similarly, the primitive List definition in Haskell is defined as a singly-linked list like so:3

data List a = Nil
            | Cons a (List a)

The Binary Operation

With our RDT, we can approach the second piece of requisite knowledge to understand the problem which is the operation we perform on our Recursive Data Type. In the problematic pattern, any associative, binary operator will do.

  • Binary meaning it takes two inputs and maps them to a single output
  • and Associative meaning that the output is independent of the groupings of the operands.

Formally, the associative law is:

(ab)c=a(bc)(a \oplus b) \oplus c = a \oplus (b \oplus c)

for some operator \oplus. Here I'm using an oblique operator symbol to make the problem abstract so as not to prematurely confuse myself by thinkink strictly in terms of arithmetic addition, but arithmetic addition is obviously also a binary, associative operator. In practice, addition, multiplication, max, min, list concatenation (++) are all associative binary operators.

In the recursive case, the cost of the binary operator is proportionate to the magnitude of the first/sinistral operand:

O(xy)=xO(x \oplus y) = |x|

This makes more sense in the context of list concatenation for a naive list implementation like the one introduced above where we'd have to traverse the entirety of the first list xx to reach its end in order to append yy to it.

You know in the movie The Big Short Adam McKay decided to transparently patronize the audience by being like "here's Margot Robbie to explain a room-temp IQ economics concept to you because you're dumb," I will now do that with Python:

x = myList # init an iterator to the head of myList
while (x.next != None): x = x.next
x.next = y

Now, certainly we can devise a specific solution i.e. the galaxy brained idea of keeping track of both the head and the tail of our list to get a reference to the end in constant time and skip the linear-cost traversal entirely at the cost of some fixed overhead in terms space (to store the references to arbitrary positions or states in our data structure) and time (to update those references whenever we mutate the data structure). But the problematic pattern is general and poses some other problem not unique to singly-linked lists. The paper focuses on list concatenation which we'll continue to run with, but also describes Tree substitution and The General Problem in greater detail.

Singly-Linked Lists

And here lies the problem, depending on the associativity-priority/preference of our language or data type, we can incur a different runtime complexity for equivalent expressions for all instances x,y,zx,y,z of our recursive data type:

leftright(x+y)+zx+(y+z)    O(2x)    O(x+y)\begin{aligned} &\text{left} &\quad &\text{right} \\ &(x + y) + z &\quad &x + (y + z) \\ \implies &O(2|x|) &\quad \implies &O(|x + y|)\\ \end{aligned}

In the left\text{left} example, we have to traverse all of xx to append yy to the end. Then we have to traverse all of xx and then all of yy before we can find the last node which we update to point at zz.

This might not be the end of the world if xy|x| \approx |y|, but for xy|x| \gg |y| it's apparently bad.

In general, the problem occurs with any associative operator that traverses is left argument but not its right argument that operates on some recursive data type, and for which the following monotonicity requirement holds:

x+yxy|x| + |y| \geq |x \oplus y|

Tree Substitution

Similarly, the problem arises for an operation which replaces the leaves of a tree with another tree:

data Tree = Node Tree Tree
          | Leaf

-- the substitution op which takes two trees and returns a tree
() :: Tree -> Tree -> Tree
Leafy = y
(Node l r)= Node (ly) (ry)

The authors point out that the performance issue is the same: reducing xyx ↩ y to normal form costs x|x| steps, so (xy)z(x ↩ y) ↩ z runs in 2x+y2|x| + y steps and the equivalent expression x(yz)x ↩ (y ↩ z) runs in only x+y|x| + |y|.

Structuring the problem with trees in this manner underscores its generality. Whereas we just handwaved the problem away via efficiently catenable sequence data structures, no such established partial solution even exists for trees...

Should we invesitage a new specialized data structre for trees or browse the literature to see if someone else has already invented it?

(Hint: No).

lol.

Solving the Problem Poorly

Given a concrete List implementation:4

-- I don't know what the conventional Haskell fmt is
-- and I don't care bc anyone who knows better than me
-- would've clicked off in disgust by now
data List a = Nil
            | Cons a (List a)

       -- defn of list concatenation
       (++) :: List a -> List a -> List a
     x ++ y = case x of
                Nil ++ y   => y
                (Cons h t) => Cons h (t ++ y)

we could explicitly solve the issue with list concatenation (++) by redefining it to be right associative like so:

-- default/problematic `++` impl i.e. `left` that take 3 lists
-- like our x, y, z and returns a new list
left :: List a -> List a -> List a -> List a
left x y z = let
    xy = x ++ y -- |x| steps to concat x and y
    in
        xy ++ z -- |x + y| steps to concat xy and z
    -- therefore `left` is O(2|x|)

versus

-- based `right` associative impl of ++
right :: List a -> List a -> List a -> List a
right x y z = let
    yz = y ++ z -- |y| steps to concat y and z
    in
        x ++ yz -- |x| steps to concat x and yz
    -- therefore `right` is O(|x + y|)

Now, the astute reader might say

"Peter, these are but negligible coefficients in a sea of infinite elements. Get thee to a library."

Nay dear reader, and damn you! O(2x)O(2|x|) can become quadratic! For left associative binary operators, it's possible that the left operand (xx) is itself the result of a binary operation on an RDT s.t. the process can blossom from a linear O(2x)O(2|x|) to a painful summation over all recursive terms contained within the left operand!

The right-associative expression on the left can be computed in

i=1n1ai\sum_{i=1}^{n-1} |a_i|

and conversely, the (poorly captioned IMO) left-associative expression on the right –an expression of the form (xy)z(x \oplus y) \oplus z – runs in 2x+y2|x| + |y| steps.

If we iterate this pattern, we obtain a left-associated expression which –although equivalent to the corresponding right-associative expression– performs far worse:

(((a1a2)a3)an1)an    i=1n1(ni)ai    O(n2)\begin{aligned} &(((a_1 \oplus a_2) \oplus a_3) \cdots \oplus a_{n-1}) \oplus a_n \\ \implies&\sum_{i=1}^{n-1} (n-i)|a_i| \\ \implies&O(n^2) \end{aligned}

And for even just constant-sized elements i.e. ai=1|a_i|=1 we quickly see how the left-associated expression becomes asymptotically slower than the equivalent right-associated expression

which is BAD!

Solving the Problem Poorly: part 2

Ken Iverson completey dodged this issue by just making all operations in APL be right-associative:

Solving the Problem Partially: CPS

Aptly named Continual Partial Sequences only partially solve this issue by providing efficient means for performing the operation. They generate right-associative expressions via derived (right-associative) data types.

For example, for our list example, the CPS transformation uses a derived DiffList which is good at being concatenated, but poor at being "observed":

-- A Haskell `type` is just an aliase for an existing type
type DiffList a = List a -> List a

fromDiff :: DiffList a -> List a
fromDiff x = x Nil -- O(|x|), expensive to convert back!

toDiff :: List a -> DiffList a
toDiff l = \t -> l ++ t -- O(1), free to convert to

CPS transform is one way to provide rudimentary operations to circumvent the problematic pattern via deriving right-associativity. And, as we can see the CPS implementation is not dependent on the type parameter a. This DiffList will work for anything. However, we can also observe that –though it's free to partially-apply DiffLists together (constant time complexity is as good as it gets barring time travel)–

converting our data structure into a DiffList is expensive in the long run.

With the idea being that for our new right-associative primitive, we can quickly perform \oplus', and only incur asymptotic penalties when we convert back to our non-derive (dare I say ?integrated¿) type.

![[cps-2.png]]

The general problem with CPS transform is that when we want to inspect our accumulating XijX_{ij} and/or its constituent Xi,XjX_i, X_j we have to fromDiff it, costing at least a left-associative asymptotic overhead penalty.

Using a naive left-associative \oplus, we can inspect for free, but combine at cost.

And conversely, for our CPST \oplus', we can combine for free, and inspect at cost. But we want both in the form of a generalized ADT dammit!

fromDiff is the inspection, and the linear cost per reflection is the titular remorse.

difference lists only solve performance problems if our usage of lists is strictly separated into a build (i.e. concatenation) phase and an observation phase. If we alternate between building and observing, as is often needed, then performance problems will resurface.

Solving the Problem: Mutually Recursive Data Types

The authors point out that previous, partial-solutions only treat expressions of the form

a0a1ana_0 \oplus a_1 \oplus \cdots \oplus a_n

as sequences implicitly when in fact this struture can be used to our advantage explicitly.

Implicitly, such expressions are trees at runtime, where leaves are elements and nodes are (delayed) function applications. The authors show that by making these sequences explicit, we can choose a more suitable data structure which alleviates many of our performance problems

In the prior sketch of DiffLists, nothing in our base type XX relies on the derived type XX'.

XX' references XX, but XX does not reference XX'! If we derive two types from our base type XX, for viewing (View[X]View[X]) and efficient combination (Combine[X]Combine[X]) respectively, we can solve the problem of division of observation and combination phases. The crux is that the view type is concrete, rather than recursive, allowing us to observe the contents of XX in constant time.

For the example problem of tree substitution, the authors propose an explicit representation of the sequence of expressions using a TreeExp:

type TreeExp = CQueue Tree

where CQueue is an efficient sequence data structure,5 in the sense that observation of both the head and the tail of the sequence, as well as concatenation to another sequence are all amortized constant time operations.

To support efficient partial conversion (for observation of intermediate states of our expression sequence, which CPS does not do), we change the type of the children of our Tree to explicitly represented expressions:

data Tree = Node TreeExp TreeExp
          | Leaf

such that the tree substitution operator \hookleftarrow (which, in this context, can be thought of as "applyExpression") no longer takes a single tree as its second argument, but rather an explicitly represented expression that results in a tree:

() :: Tree -> TreeExp -> Tree
      Leafy = val y
(Node l r)y = Node (l ++ y) (r ++ y)

where:

  • if is called on a Leaf, then val is assumed to be a function that converts a TreeExp to a Tree,
  • otherwise if if is called on a child-bearing Node, the TreeExp is evaluated and joined to the subtrees of the original node.

The ++ operator on Nodes is actually a ^\hat \bowtie (I shit you not the authors chose \hat\bowtie for their constant time concatenation operation,,,). Importantly, \hookleftarrow is not recursive anymore! It's not a constant time operation.

Improving upon CPS, conversion between the explicit representation of the expression and the result of that expression is defined by the (perhaps confusingly named) val operator:

val :: TreeExp -> Tree
val s = case viewl s of
         EmptyL -> Leaf
         ht -> ht

Let's break this down,

  • viewl is a function that allows us to view the sequence from the left (providing isEmpty, head, and tail ops)
  • The fucked up ^\hat\lhd operator is prepend

Since \hookleftarrow is not recursive, neither is val! Converting an explicitly represented expression to an observable value does not necessitate converting the entire explicitly represented expression, just the top (left) of the tree via val (in this example of the tree substitution problem).

Going the other direction, converting a tree to an explicitly represented expression can be achieved by constructing a singleton sequence:

expr :: Tree -> Tree expr
expr = singleton

The \hookleftarrow on the original type can then be redefined as:

(↩') :: Tree -> Tree -> Tree
l ↩' r = lexpr r

which disappears all our performance problems. Both

a^(b^c)and(a^b)^c\begin{aligned} a \hat\hookleftarrow &(b \hat\hookleftarrow c) \\ &\text{and} \\ (a \hat\hookleftarrow &b) \hat\hookleftarrow c \end{aligned}

are constant time operations, and their conversions are also efficient, allowing us to arbitrarily alternate between building and inspecting sequences of tree substitutions.

Type-Aligned Sequences

The above example works for non-generic trees, but what about in the general case.

we must then explicitly represent expressions of the form:

m=f1=f2=f3=fnm \gg= f_1 \gg= f_2 \gg= f_3 \cdots \gg= f_n

(where the >>=>>= operator is the monadic bind which takes a monad and a function that returns another monad, sequentially executes them, and unwraps the final result – which is exactly the pattern we have with the solution presented above).

However, this which causes a problem because each fif_i has type aTree ba \rightarrow \textsf{Tree } b for some a,ba,b which might differ between elements, meaning we can't use a regular sequence!

We instead need a generalized interface parameterized over heterogenous types:

we need a Seq[ac]\textsf{Seq}[a \rightarrow c] which doesn't exist.

In Haskell, this would look like a sequence parameterized by a type constructor c such that each element is of type c a b for some a b. The alignment stipulation would be that the last type argument to c of some element is the first argument to the subsequent c in the sequence. If we set the type constructor c to ->, then we get type aligned sequences of functions: the output of a function is then always the input type to the next function!

While we're deisgning our preferred API for this magic interface, we also want support for:

  • view or pop for reflection
  • create
  • append
  • concat

all in amortized constant time or even O(1)O(1) if possible (it is).

To accomplish these lofty requirements, we'll define our sequence in terms of the recursive operations rather than the type.

... YEars of functional programming and efficient data structures, and no one thought to compose them...

Haskell's standard library typeclasses are amortized constants for the operations you'd expect.

  • append is O(nlogn)O(n \log n), but there's also common libraries which provide a bunch of functional (in the paradigmatic sense) data structures
  • This technique is useful for library maintainers, and can/should be opaque to clients
  • We took something that was asymmetric and made it symmetric (and fast)

The General solution

For a Recursive Data Type XX, a monotonic left-associative binary operator, the solution is:

  1. In the definition of the data type XX, replace all self-references with a type aligned sequence which represents expressions involving the problematic operator explicitly
  2. Instead of implementing the original operator, implement the operator such that its right argument is an explicitly represented expression and use efficient concatenation to implement the operator
  3. Define functions to convert between values and explicitly represented expressions
  4. Define the operator with the original type XX, using the new version of the operator and a conversion to an explicitly represented expression of the RHS
  5. Use the functions to convert between explicitly represented expression and values where needed

TAS can be understood as a composition of functions f1f2f3fnf_1 \circ f_2 \circ f_3 \circ \cdots \circ f_n. These are only composable if the sequence is well typed. That is, the output of fi1f_{i-1} shares the same type as the input to fif_i for all ii in our sequence of functions.

This eliminates a whole category of implementation bugs since the sequence has to be well typed.

If it wasn't dense enough already, the remainder of the paper dives into the ~folds of category theoretic implications and broader use with specific monoids.

The feelings I hold toward's Haskell, and functional programming in general, are similar, I think, to those that Amalfitano feels about Dieste's Testamento geometrico.

Footnotes

Footnotes

  1. Bolaño, Roberto. 2666. 2008.

  2. become a tree hugger, Learn You a Haskell

  3. https://hackage.haskell.org/package/base-4.18.0.0/docs/GHC-List.html

  4. This is adapated for readability. Here's the actual Haskell implementation.

    ----------------------------------------------
    --               append
    ----------------------------------------------
    
    -- | Append two lists, i.e.,
    --
    -- > [x1, ..., xm] ++ [y1, ..., yn] == [x1, ..., xm, y1, ..., yn]
    -- > [x1, ..., xm] ++ [y1, ...] == [x1, ..., xm, y1, ...]
    --
    -- If the first list is not finite, the result is the first list.
    
    (++) :: [a] -> [a] -> [a]
    {-# NOINLINE [1] (++) #-} -- We want the RULE to fire first.
                              -- It's recursive, so won't inline anyway,
                              -- but saying so is more explicit
    (++) [] ys = ys
    (++) (x:xs) ys = x : xs ++ ys
    
    {-# RULES
    "++" [~1] forall xs ys. xs ++ ys = augment (\c n -> foldr c n xs) ys
    #-}
    
    -- |'otherwise' is defined as the value 'True'. It helps to make
    -- guards more readable. eg.
    --
    -- > f x | x < 0 = ...
    -- > | otherwise = ...
    otherwise :: Bool
    otherwise = True
    

    god haskell src rocks hackage

  5. Just like many Python libraries, whenever speed is of the essence, we outsource the computation to its implementation in C. Kidding, of course – the C stands for Concurrent.