Useless Solutions Call for Useless Problems

We're talkin' quantum; get in or get out

Useless Solutions Call for Useless Problems

Consider a toy problem wherein we're given a black box function $f : \{0, 1\}^4 \rightarrow \{0, 1\}^4$ which operates on four 1-bit inputs which we'll label $b_1, b_2, b_3$ and $a$ . Our function $f$ is perhaps better interpreted as a “circuit” with the following behavior:

f = \begin{cases} a \leftarrow \lnot a \text{ if } b_1 \\ a \leftarrow \lnot a \text{ if } b_2 \\ a \leftarrow \lnot a \text{ if } b_3 \\ \end{cases}

The catch is that any subset of those cases might be ignored, deleted, faulty, flipped by a gamma ray,¹ etc. E.g., we might receive an $f$ with the second case “deleted” resembling:

f = \begin{cases} a \leftarrow \lnot a \text{ if } b_1 \\ \; \\ a \leftarrow \lnot a \text{ if } b_3 \\ \end{cases}

For an instance of this problem with $n$ inputs, there are $2^{n-1}$ possible subsets of “missing” statements which can be interpreted as toggles acting on our accumulator. ( $2^{n-1}$ since we discount the $a$ term which accumulates the effect of the other inputs).

We can enumerate the set of all possible configurations of this black box function $f$ with 4 arguments (three input bits and one accumulator bit):

\begin{aligned} F_3 = \begin{Bmatrix} &\; &f_\emptyset, &\; \\ &f_{b_1}, &f_{b_2}, &f_{b_3}, \\ &f_{b_1,b_2}, &f_{b_2,b_3}, &f_{b_1,b_3}, \\ &\; &f_{b_1,b_2,b_3} &\; \end{Bmatrix} \end{aligned}

Where the subscripts correspond to the cases that are present in that instance of the function. The problem statement is: given some unknown function $f \in F_n$ , we want to determine which $f$ we’re dealing with in as few invocations or “queries” to $f$ as possible. ( $f$ is conventionally referred to as an oracle for this kind of problem).

A naive approach might be to select an arbitrary initial binary string, say $\bar 0$ , flip the first bit, pass it to $f$ and observe what happens. For $f \in F_3$ this might look like:

f(1000) \rightarrow 1001

This tells us that the first case (toggling $a$ contingent on toggling $b_1$ ) is present in our instance of $f$ . Note that this logarithmically eliminates fully half of the $2^3$ possible $f$ s (the ones where $b_1$ is absent, deleted, corrupted, etc.):

\begin{aligned} F_3 = \begin{Bmatrix} &\; &\color{red}f_\emptyset\color{black}, &\; \\ &f_{b_1}, &\color{red}f_{b_2}\color{black}, &\color{red}f_{b_3}\color{black}, \\ &f_{b_1,b_2}, &\color{red}f_{b_2,b_3}\color{black}, &f_{b_1,b_3}, \\ &\; &f_{b_1,b_2,b_3} &\; \end{Bmatrix} \end{aligned}

We can repeat with another input string to target the behavior of another of the cases, say:

f(0100) \rightarrow 0100

which tells us that the case for $b_2$ is missing since $a$ was not toggled. So we can eliminate another half of the remaining possibilities:

\begin{aligned} F_3 = \begin{Bmatrix} &\; &\color{red}f_\emptyset\color{black}, &\; \\ &f_{b_1}, &\color{red}f_{b_2}\color{black}, &\color{red}f_{b_3}\color{black}, \\ &\color{red}f_{b_1,b_2}\color{black}, &\color{red}f_{b_2,b_3}\color{black}, &f_{b_1,b_3}, \\ &\; &\color{red}f_{b_1,b_2,b_3}\color{black} &\; \end{Bmatrix} \end{aligned}

And, finally, we isolate the behavior of the $b_3$ branch:

f(0010) \rightarrow 0011 \implies \begin{aligned} F_3 = \begin{Bmatrix} &\; &\color{red}f_\emptyset\color{black}, &\; \\ &\color{red}f_{b_1}\color{black}, &\color{red}f_{b_2}\color{black}, &\color{red}f_{b_3}\color{black}, \\ &\color{red}f_{b_1,b_2}\color{black}, &\color{red}f_{b_2,b_3}\color{black}, &f_{b_1,b_3}, \\ &\; &\color{red}f_{b_1,b_2,b_3}\color{black} &\; \end{Bmatrix} \end{aligned}

So we know that our given function must be $f_{b_1,b_3}$ , but that took $n=3$ total invocations for a mystery function with three inputs. Can we do better? We might experiment with more “sophisticated” input string combinations, but we’ll quickly find that regardless of our input to $f$ , we can never eliminate more than half the possible remaining answers at any iteration of this sluethy procedure.

To prove this, let’s consider what information we gain by sampling $f$ . Or –donning our Shannon deerstalker– perhaps a better question to ask is: how much information do we gain?

The answer is –perhaps obviously from the framing of the problem statement thus far– precisely 1 bit of information which gets encoded in the last bit of the output $a$ . So, with classical computing techniques, the answer is: … no, we cannot do better than $n$ queries to $f$ .

Suppose that there did exist a procedure which lets us identify the $f \in F_3$ with only two invocations. We would have at our disposal 2 bits of information:

\begin{aligned} f(\bullet) = …a_1 \\ f(\bullet) = …a_2 \\ \end{aligned}

Leaving us with four possible combination of $a_1, a_2$ :

a_1a_2 = \{ 00, 01, 10, 11\}

But there are eight possible choices for $f \in F_3$ and 2 bits of information still only inform on four of those possibilites, so we can conclude that any such procedure which claims perfect accuracy is WRONG since there’s simply not enough information. Not with classical computing techniques, that is. With quantum computing, we can get the number of invocations of $f$ down to one singular sample for any naturally numbered $n$ by leveraging some sick linear algebra on the properties of qubits.

Quantum

A qubit is the fundamental datatype in quantum computing. Typically denoted $|\psi\rangle$ , with the bracket notation indicating that the value of $\psi$ can exist somewhere in between the two basic states: $|0\rangle$ and $|1\rangle$ . The measure of a qubit’s propinquity to each basic state is denoted as an amplitude:

\begin{aligned} -1.0 \leq \alpha_{|0\rangle} \leq +1.0 \\ -1.0 \leq \alpha_{|1\rangle} \leq +1.0 \end{aligned}

And we can omit the amplitudes' subscript when the basic state being referenced is unambiguous.

Crucially, we don’t need a concrete implementation of a device which can replicate superposition (though this is more achievable than we might initially assume) in order to reason about quantum quantities.

Properties of Qubits

Measuring a qubit’s amplitude collapses it to one of its basic states depending on the amplitudes which can be thought of as probabilities of observing the corresponding basic state. The exact rule governing which basic state is observed when a qubit collapses is given by the amplitude of that basic state squared:

p\Big[|\psi\rangle = |0\rangle\Big] = \alpha^2

E.g. for a qubit $|\psi\rangle$ with amplitude $\alpha_{|0\rangle} = 0.8$ we know that the probability of observing one of the two possible basic states must sum to one, so we can infer the amplitude of the other unknown basic state $\alpha_{|1\rangle}$ :

\begin{aligned} &p\Big[|x\rangle = |0\rangle\Big] + p\Big[|x\rangle = |1\rangle\Big] = 1 \\ &p\Big[|x\rangle = |0\rangle\Big] = 0.8^2 = 0.64 \\ \implies &p\Big[|x\rangle = |1\rangle\Big] = \sqrt{1- 0.64} = \sqrt{0.36} = 0.6^2 \end{aligned}

Note that this definition also allows for negative amplitudes, e.g. this is an equally valid quantum state for $|\psi\rangle$ to be in:

Since $\alpha_{|0\rangle}^2 + \alpha_{|1\rangle}^2 = 1$ .

Similarly to classical bits, for $n$ qubits, there are $2^n$ possible measurements which are the $2^n$ possible basic state configurations after measuring each of the qubits’ amplitudes. However, collections of qubits form quantum systems, the constituents of which cannot be meaningfully considered in isolation.

E.g. For $n=2$ qubits, we have a system of 4 basic states:

With some amplitude on each of the basic states where the sum of the squares of said amplitudes is one:

\begin{matrix} & p\Big[|x\rangle = |00\rangle\Big] = &0.35^2 \\ & p\Big[|x\rangle = |01\rangle\Big] = &0.2^2 \\ & p\Big[|x\rangle = |10\rangle\Big] = &(-0.47)^2 \\ & p\Big[|x\rangle = |11\rangle\Big] = &0.79^2 \\ \hline & & \sim 1 \end{matrix}

While it may seem natural to assume that each qubit on its own has distinct amplitudes on each of its basic states, this is not the case. Together, the $n$ qubits jointly have amplitudes on each of their $2^n$ basic states. This property is called quantum entanglement and is precisely what gives rise to the clever solutions found in quantum programming which are not possible with conventional approaches.

Quantum Programming

A quantum program takes as input $n$ qubits in some valid superposition of the $2^n$ basic states and takes them to some other (excluding the identity QP, I guess) valid super positions. In other words, a quantum program is any set of operations which satisfies the pre- and post-condition:

\sum_i^{2^n} \alpha_i = 1

Where $i$ is expressed in base 10². A quantum operation we might be interested in to solve the problem from earlier would be qubit negation on $n$ qubits. This can be achieved by "swapping" amplitudes of corresponding basic states. The operation can be visualized on a $n-$ dimensional cube. E.g., for $n=3$ :

The “toggle” or negation operation swaps a corner on the face of the cube where $q_i = 0$ with The corner where $q_i = 1$ , and $i \in \{1, 2, …, n\}$

This operation is one of three named for Nobel prize winner Wolfgang Pauli, and is denoted $X$ or $\sigma_x$ . We’ll cover the other two operations in a following section. Note that, without even knowing how to quantitatively represent this negation operation, we know that it is a valid one since it doesn’t violate the pre- and post-condition of squared amplitude summation to one.

To avoid having to bother with validation summations for each quantum operation we might want to define, we can institute an abstract constraint that these operations only define how each of the basic states are transformed in terms of one another rather than modifying amplitudes directly. This opens many direct analogs to linear algebra that might already be showing around the cracks of this ELI25AIDKALA³ explanation.

This abstraction constraint begs the question, though, what about non-basic states? E.g. what if we composed a 1-bit probabilistic operator with our negation “gate” which toggles the basic state of a qubit of with $p = 2/3$ and does nothing the rest of the time:

RNG(q) = \begin{cases} \sigma_x(q) &\text{ if } p \sim U < 2/3 \\ \\ Id(q) &\text{ if } p \sim U \geq 2/3 \end{cases}

We can visualize this behavior as a tree without having to worry our pretty little brains with any quantitative conception of what the hell we’re talking about:

We can see that the probability of observing any of the basic states of this system is just the product of the probabilities of descending down any branch $\pi$ of this amplitude tree that results in that basic state e.g. $|01\rangle$ :

p\big[|AB\rangle = |01\rangle\big] = \prod_{i \in \pi_{|01\rangle}} p_i = 1 \times \frac{2}{3} \times 1 = \frac{2}{3}

Quantum program analysis is almost identical to this RNG example, the only difference being that quantum instructions operate on amplitudes which can be negative whereas a classical stochastic decision tree operates on probabilities on the unit interval. Conveniently for us, this means that some of the amplitudes might cancel each other out, whereas in the classical context, probabilities are strictly additive (in the monotonically increasing sense of the word).

So far the two operations we’ve defined ( $\sigma_x$ and simple conditionals), don’t offer any further utility than those gates which are already available in classical circuitry as evidenced by their amplitude trees which don’t introduce any quantum branching, even for more "complex" arrangements therein:

\begin{aligned} p\big[C = 1 \big] &= \sum_{\pi \in \{\pi_{|\bullet\bullet1\rangle}\}} \prod_{i \in \pi} p_i \\ &= \Big(\frac{1}{3}\cdot\frac{2}{3}\cdot1\cdot1\cdot1\Big) + \Big(\frac{1}{3}\cdot\frac{2}{3}\cdot1\cdot1\cdot1\Big) \\ &= 2/9 + 2/9 = 4/9 \end{aligned}

It’s now time to introduce non-trivial non-basic qubit states. Let’s define a unary function: $Rotate: |\psi\rangle \rightarrow |\psi\rangle$ with the following amplitude tree:

We can easily verify that it’s a valid quantum operation since it maps valid quantum states to other valid quantum states. Now suppose we have another operation $f$ which takes some qubit in the zero state and transforms it to a qubit with some valid amplitudes $x,y$ on each of its basic states s.t. $x^2 + y^2 = 1$ . We can (non-rigorously) check that $Rotate$ is valid by passing $f(q)$ as its input and examining the resultant amplitude tree:

Collapsing the amplitudes to measure the distribution over basic states, we get:

\begin{aligned} |0\rangle &= 0.6x -0.8y \\ |1\rangle &= 0.8x + 0.6y \\ \implies 1 &= \alpha_{|0\rangle}^2 + \alpha_{|1\rangle}^2 \\ &= (0.6x -0.8y)^2 + (0.8x + 0.6y)^2 \\ &= 0.36x^2 + 0.64y^2 \color{red}-2(0.48)xy\color{black} + 0.64x^2 + 0.36y^2 \color{red}+2(0.48)xy\color{black}\\ &= x^2 + y^2 \end{aligned}

So, $Rotate$ is valid $\blacksquare$ . An example of an invalid operation might be assignment. Yes, that most basic of instructions we might expect to see in a program:

q \leftarrow |0\rangle

Verboten! We can develop some intuition about this by subjecting assignment to the same procedural treatment as $Rotate$ :

\begin{aligned} |0\rangle &= x + y \\ |1\rangle &= 0 \\ \implies 1 &= \alpha_{|0\rangle}^2 + \alpha_{|1\rangle}^2 \\ &= (x + y)^2 + 0^2 \\ &= x^2 +y^2 +2xy \\ &\implies 1 + 2xy \neq 1 \end{aligned}

Now, for a non-trivial quantum operation, and perhaps the most important (? depends who u ask, where my Hadamard truthers at), the Hadamard:⁴^,⁵

Let’s observe how it behaves with some of our other operations:

And we can go a step further by enumerating the amplitudes on each of the basis states at each of the six steps of the program:

instr	$\vert000\rangle$	$\vert001\rangle$	$\vert010\rangle$	$\vert110\rangle$	$\vert111\rangle$
`init`	+1
`Toggle(q1)`	0	+1
`Hadamard(q1)`		+0.71		+0.71
`Hadamard(q3)`		+0.5	+0.5	+0.5	+0.5
`if q1 and q2 Toggle(q3)`		+0.5	+0.5	+0.5	+0.5
`Hadamard(q3)`		+0.71		+0.71

Note that – even though the $+0.5$ amplitude appears on the $|011\rangle$ basic state, the final amplitude on this state is zero because both of the branches where it appears have inverse amplitudes which cancel each other out. And herein lies the crux of quantum programming: designing algorithms where the amplitude of "undesirable" quantum states cancel out:

\begin{aligned} p\Big[q_{123} = |011\rangle\Big] = 1 &\cdot \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \cdot 1 \cdot \frac{1}{\sqrt{2}} \\ & + 1 \cdot \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}} \cdot 1 \cdot -\frac{1}{\sqrt{2}}\\ &=0 \end{aligned}

This isn’t always feasible, however. Sometimes the best we can do is leaving a subset of ”undesirable” output states with a minimal amount of amplitude and then relying on probabilistic correctness over a series of trials.

Designing Quantum Programs

Since the state of $n$ qubits can be fully described by a collection of $2^n$ amplitudes, it’s natural to model $n$ qubits as $2^n$ -dimensional vectors containing amplitudes on each of the basic states:

|\psi \rangle = \begin{bmatrix} -0.57 \\ 0.02 \\ 0.12 \\ 0.03 \\ -0.35 \\ -0.05 \\ 0.19 \\ -0.71 \end{bmatrix} \implies \sum_i |\alpha_i|^2 = 1

With a vector spanning eight dimensions (1 for each basis) corresponding to each of the basic states, we can easily visualize up to three of those dimensions. And, per the amplitude normalization constraint, we can geometrically represent the state of the qubits as a point on the $2^n$ -dimensional unit sphere. This is fun to say, but for any number of qubits greater than 1, we need at least 4-dimensional space which is more than you might be able to visualize.⁶ Before getting bogged down with $n$ -dimensional visualizations, let’s play with our linear algebraic interpretation of qubit state for $n=1$ which is nicely represented on the plane:

We can describe quantum instructions operating on qubits as matrices of transition amplitudes multiplied by the state vector.

Now, back to the harsh realities of $n > 3$ -dimensional space, suppose we have two qubits $q_1, q_2$ and we want to compute the Hadamard matrix for this system, we need a matrix of corresponding shape: $H^{\otimes 2}$ .

For $H_{q_1}$ , we can just fill in transition amplitude entries in the matrix for basis state pairs where $q_2$ doesn't change:

H_{q_1} = \frac{1}{\sqrt 2} \begin{bmatrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ \end{bmatrix}

And, similarly we can compute $H_{q_2}$ for values where $q_1$ doesn't change:

H_{q_2} = \frac{1}{\sqrt 2} \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ \end{bmatrix}

Furthermore, we can compose them via normal matrix multiplication to compute $H^{\otimes 2} = H_{q_1}\otimes H_{q_2}$ – and order actually doesn't matter for these operations since they modify different qubits by construction:

\begin{aligned} H^{\otimes 2} &= H_{q_1}H_{q_2} = H_{q_2}H_{q_1} \\ \\\ &= \frac{1}{\sqrt 2} \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \\ 1 & 1 & -1 & -1 \\ 1 & -1 & -1 & 1 \\ \end{bmatrix} \\ \end{aligned}

and we can further simplify the notation since the magnitude of all the entries are equivalent, and all we really care about is the sign:

\begin{aligned} H^{\otimes 2} &= \frac{1}{\sqrt 2} \begin{bmatrix} + & + & + & + \\ + & - & + & - \\ + & + & - & - \\ + & - & - & + \\ \end{bmatrix} \end{aligned}

Note as well that the "convolution" of $H^{\otimes 2}$ is itself composed of $H$ :

\begin{aligned} H &= \begin{bmatrix} + & + \\ + & - \\ \end{bmatrix} \\ \implies H^{\otimes 2} &= \begin{bmatrix} H & H \\ H & -H \\ \end{bmatrix} \\ \implies H^{\otimes 3} &= \begin{bmatrix} H^{\otimes 2} & H^{\otimes 2} \\ H^{\otimes 2} & -H^{\otimes 2} \\ \end{bmatrix} \\ \vdots\\ H^{\otimes n} &= \begin{bmatrix} H^{\otimes n-1} & H^{\otimes n-1} \\ H^{\otimes n-1} & -H^{\otimes n-1} \\ \end{bmatrix} \\ \end{aligned}

This may come in handy when implementing a function to generate $H^{\otimes n}$ , though the Kronecker product ( $\otimes$ ) achieve this naturally – we'll become intimately familiar with its behavior later on.

Mystery Toggles Revisited

Finally, we can return to our mystery problem on $n$ toggles and leverage the Hadamard to improve upon our $O(n)$ classical solution.

The procedure is as follows:

initialize $n+1$ qubits in the 0 state, one for each possible toggle as well as an accumulator,
Toggle the accumulator,
Apply $H^{\otimes n - 1}$ ,
Invoke the oracle $f$ on our input string,
Apply $H^{\otimes n - 1}$ again,
Measure the amplitudes of the system to identify which cases are absent from $f$ .

To understand how and why this works, let's trace it for $n = 3$ , with $f = f_{b_2}$ (only the second toggle is present).

First, we create our $2^n = 8$ dimensional state vector representing all combinations of $b_1b_2a$ with all of the amplitude on the zero state:

|\psi \rangle = \begin{bmatrix} + \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix} \; \begin{matrix} |000\rangle \\ |001\rangle \\ |010\rangle \\ |011\rangle \\ |100\rangle \\ |101\rangle \\ |110\rangle \\ |111\rangle \\ \end{matrix}

Next, we toggle the accumulator bit, transferring all the amplitude from $|000\rangle \rightarrow |001\rangle$ .

| \psi \rangle = \begin{bmatrix} 0 \\ + \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}

Then, we apply $H^{\otimes 3}$ to the state vector:

|\psi\rangle H^{\otimes 3} = \begin{bmatrix} 0 \\ \colorbox{yellow}{$+$} \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix} \begin{bmatrix} + & + & + & + & + & + & + & + \\ \colorbox{yellow}{$+$} & \colorbox{yellow}{$-$} & \colorbox{yellow}{$+$} & \colorbox{yellow}{$-$} &\colorbox{yellow}{$+$} & \colorbox{yellow}{$-$} &\colorbox{yellow}{$+$} & \colorbox{yellow}{$-$} &\\ + & + & - & - & + & + & - & - \\ + & - & - & + & + & - & - & + \\ + & + & + & + & - & - & - & - \\ + & - & + & - & - & + & - & + \\ + & + & - & - & - & - & + & + \\ + & - & - & + & - & + & + & - \\ \end{bmatrix}

and recall that matrix multiplication with a vector with a single entry of 1 essentially acts as a select on that column s.t. our resultant product is the second row of our Hadamard matrix as a column vector:

|\psi\rangle H^{\otimes 3} = \begin{bmatrix} + \\ - \\ + \\ - \\ + \\ - \\ + \\ - \\ \end{bmatrix}

Then, we invoke $f$ on the quantum state which –if we remember that far back– toggles the accumulator iff $b_2$ in the input string is set. This has the effect of applying $f$ to each entry in the vector, examining the middle bit of each basis state:

f\big(|\psi\rangle H^{\otimes 3}\big) = \begin{bmatrix} + \\ - \\ \colorbox{yellow}{$-$} \\ \colorbox{yellow}{$+$} \\ + \\ - \\ \colorbox{yellow}{$-$} \\ \colorbox{yellow}{$+$} \\ \end{bmatrix} \begin{matrix} |000\rangle \\ |001\rangle \\ |010\rangle \\ |011\rangle \\ |100\rangle \\ |101\rangle \\ |110\rangle \\ |111\rangle \\ \end{matrix}

Note that, prior to this step, all states where the accumulator bit was 1 had negative amplitude, and all states where the accumulator bit was 0 had positive amplitude. When we invoke the oracle $f$ , the basis states where the accumulator doesn't change still have positive amplitude if the accumulator bit is unset and negative if it is set, just like before – but those basis states where the accumulator did get toggled now have the inverse amplitude pattern ( $-$ if $a = 0$ , and $+$ if $a = 1$ ) which encodes a sort of truth table for $f$ where positive amplitudes correspond to the accumulator being set.

In other words, taking the basis state $|010\rangle$ which has negative amplitude implies that if $b_1 =0, b_2 = 1, a =0$ , then $a$ will be $1$ after $f$ is called. Conversely, in $|100\rangle$ the accumulator bit is unset since it has positive amplitude.

Now we run into another problem: if we were to try to measure the qubits now, the amplitudes would collapse to pure noise since all the magnitudes would cancel each other out and we'd effectively get a random 3-bit binary string as output. To account for this, we just plop in another $H^{\otimes 3}$ and, curiously, the purpose of this operation is completely different from the columnal selection from earlier since our state vector is no longer of the correct form to pick a single entry.

Again, it's useful to visualize what happens when we perform this multiplication:

\begin{aligned} f\big(|\psi\rangle H^{\otimes 3}\big) &= \begin{bmatrix} \color{blue} + \\ \color{blue} - \\ \color{blue} - \\ \color{blue} + \\ \color{blue} + \\ \color{blue} - \\ \color{blue} - \\ \color{blue} + \\ \end{bmatrix} \begin{bmatrix} + & + & + & + & + & + & + & + \\ + & - & + & - & + & - & + & - \\ + & + & - & - & + & + & - & - \\ + & - & - & + & + & - & - & + \\ + & + & + & + & - & - & - & - \\ + & - & + & - & - & + & - & + \\ + & + & - & - & - & - & + & + \\ + & - & - & + & - & + & + & - \\ \end{bmatrix} \\ \\ & =\color{blue}+\color{black} \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ \end{bmatrix} \color{blue}-\color{black} \begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \\ 1 \\ -1 \\ 1 \\ -1 \\ \end{bmatrix} \color{blue}-\color{black} \begin{bmatrix} 1 \\ 1 \\ -1 \\ -1 \\ 1 \\ 1 \\ -1 \\ -1 \\ \end{bmatrix} \color{blue}+\color{black} \begin{bmatrix} 1 \\ -1 \\ -1 \\ 1 \\ 1 \\ -1 \\ -1 \\ 1 \\ \end{bmatrix} \color{blue}-\color{black} \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ -1 \\ -1 \\ -1 \\ -1 \\ \end{bmatrix} \color{blue}-\color{black}\begin{bmatrix} 1 \\ -1 \\ 1 \\ -1 \\ -1 \\ 1 \\ -1 \\ 1 \\ \end{bmatrix} \color{blue}-\color{black}\begin{bmatrix} 1 \\ 1 \\ -1 \\ -1 \\ -1 \\ -1 \\ 1 \\ 1 \\ \end{bmatrix} \color{blue}+\color{black}\begin{bmatrix} 1 \\ -1 \\ -1 \\ 1 \\ -1 \\ 1 \\ 1 \\ 1 \\ \end{bmatrix} \\ \\ &\implies \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ \end{bmatrix} \mapsto \begin{bmatrix} 0 \\ 2 \\ 0 \\ 2 \\ 0 \\ 2 \\ 0 \\ 2 \\ \end{bmatrix} \mapsto \begin{bmatrix} -1 \\ 1 \\ 1 \\ 3 \\ -1 \\ 1 \\ 1 \\ 3 \\ \end{bmatrix} \mapsto \begin{bmatrix} 0 \\ 0 \\ 0 \\ 4 \\ 0 \\ 0 \\ 0 \\ 4 \\ \end{bmatrix} \mapsto \begin{bmatrix} 1 \\ 1 \\ 1 \\ 5 \\ -1 \\ -1 \\ -1 \\ 3 \\ \end{bmatrix} \mapsto \begin{bmatrix} 0 \\ 2 \\ 0 \\ 6 \\ 0 \\ -2 \\ 0 \\ 2 \\ \end{bmatrix} \mapsto \begin{bmatrix} -1 \\ 1 \\ 1 \\ 7 \\ 1 \\ -1 \\ -1 \\ 1 \\ \end{bmatrix} \mapsto \begin{bmatrix} 0 \\ 0 \\ 0 \\ 8 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix} \begin{matrix} |000\rangle \\ |001\rangle \\ |010\rangle \\ |011\rangle \\ |100\rangle \\ |101\rangle \\ |110\rangle \\ |111\rangle \\ \end{matrix} \\ \end{aligned}

As we can see, as we compute the matrix multiplication all the amplitude values gradually accumulate on the $|011\rangle$ basic state such that when we measure all the qubits, we learn the configuration of the mystery $f \in F_3$ that we were given with $p = 1$ . The proof that this program is correct for fixed $n$ is trivial via programmatically enumerating all variations of $f$ . This is moreso a result of the Hadamard operation placing each qubit into a state of superposition at the beginning of the algorithm, and then ejecting them from superposition after quantum computation.

If this seems like too convenient or mystically clever of a solution it's because this sequence of operations was discovered prior to the declaration of the problem statement. The problem of "mystery $f$ " was retconned after recognizing the utility of the Hadamard operation in order to motivate this demonstration of a quantum system offering constant time solution to a problem with linear complexity by classic means. This problem is known as the Bernstein-Vazirani problem, and is one of many similarly contrived examples of "quantum supremacy."

Formally, the Bernstein-Vazirani⁷ problem is posed: given an oracle $f: \{0, 1\}^n \rightarrow \{0, 1\}$ where all that's known of $f$ is that it's output is the dot product between the input vector and a secret string $s \in \{0, 1\}^n$ modulo 2, we're tasked with finding $s$ s.t.

f(x) = x \cdot s = x_1s_1 \oplus x_2s_2 \oplus \cdots \oplus x_ns_n

Illustrations and dramatization of the problem statement hopefully make this less brain numbing.

More Math

Now that the gloves are off in terms of the linear algebra, let's revisit some of the earlier handwavey definitions of qubits and quantum gates.

Qubits as vectors

A qubit can be expressed as a linear combination of the amplitudes and the system's basis states:

|\psi\rangle = a |0\rangle + b|1\rangle

where $a, b \in \mathbb C$ , though we can largely ignore the complex components of pretty much all the systems and gates we'll cover (for explanations of the global phase described by complex components, ur gonna need an ELI35). As a linear combination, quantum states are easily expressed as vectors:

\begin{aligned} |0 \rangle = \begin{bmatrix}1 \\ 0\end{bmatrix}, &\quad |1 \rangle = \begin{bmatrix}0 \\ 1\end{bmatrix} \\ \\ |\psi\rangle &= a|0\rangle + b|1\rangle \\ &= a\begin{bmatrix}1 \\ 0\end{bmatrix} + b\begin{bmatrix}0 \\ 1\end{bmatrix} \\ &= \begin{bmatrix}a \\ 0\end{bmatrix} + \begin{bmatrix}0 \\ b\end{bmatrix}\\ &= \begin{bmatrix}a \\ b\end{bmatrix} \end{aligned}

The basis states of a quantum state vector span the 2D Hilbert Space which is just a well-defined complex vector space where the inner product exists.

Normalization Condition

The aforementioned "validity" rule for quantum states is known as the normalization condition:

|a|^2 + |b|^2 = 1

For an $n$ -dimensional system, we sum over the squares of the amplitudes of all basic states:

\sum_i |\alpha_i|^2 = 1

Quantum Operations

Operations on single cubits can be expressed as 2x2 unitary and hermitian matrices. Unitary to ensure the conservation of amplitude, and Hermitian to ensure that the eigenvalues (measurable quantities which need to correspond to real world things) are real valued.⁸

A matrix $U$ is said to be unitary if its inverse equals its transpose conjugate (also known as the hermitian adjoint) $U^{\dagger}$ :

U^{\dagger}U = UU^{\dagger} = I

The hermitian adjoint operation is a linear map which transposes an $m\times n$ complex matrix and conjugates each entry (negating the complex component).

A = \begin{bmatrix} 0 & a + bi \\ 0 & 0 \\ \end{bmatrix} \implies A^{\dagger} = \begin{bmatrix} 0 & 0 \\ a - bi & 0 \\ \end{bmatrix}

For real matrices, the hermitian adjoint is just the transpose.

Bloch Sphere

Quantum operations on singular qubits can be visualized as maps between points on the unit sphere. As mentioned earlier, this also holds for systems of multiple qubits on hyper spheres but the utility of the Bloch sphere diminishes proportionately to one's ability to visualize higher dimensions and rotations therein.

Here, $\theta, \phi$ are the usual polar and azimuthal angles in spherical coordinate representation. The Pauli gates, then, are just rotations about these axes by $\pi$ radians. E.g.

\begin{aligned} \bar D(\bar n, \phi) &= \cos (\frac{\phi}{2})I - i\sin(\frac{\phi}{2})\vec\sigma \cdot \bar n \\ \bar D(\bar x, \phi = \pi) &= \cos (\frac{\pi}{2})I - i\sin(\frac{\pi}{2})\vec\sigma \cdot \bar n \\ &= 0 - i\vec\sigma_x \\ &= \sigma_x \\ \end{aligned}

again we drop the complex component since we're ignoring global phase of the quantum system, and just rotate dat boi about the x-axis (as observed in the planar example):

Similarly, this process yields a geometrically intuitive conception of the other two Pauli operations:

\begin{aligned} X &: |0\rangle \xrightarrow {\sigma_x} i|1\rangle \\ &X|0\rangle = \begin{bmatrix}0 & 1 \\ 1 &0 \end{bmatrix}\begin{bmatrix}1 \\ 0 \end{bmatrix} = \begin{bmatrix}0 \\ 1 \end{bmatrix} = |1\rangle \\ &X|1\rangle = \begin{bmatrix}0 & 1 \\ 1 &0 \end{bmatrix}\begin{bmatrix}0 \\ 1 \end{bmatrix} = \begin{bmatrix}1 \\ 0 \end{bmatrix} = |0\rangle \\ \\ Y &: |0\rangle \xrightarrow {\sigma_y} i|1\rangle \\ &Y|0\rangle = \begin{bmatrix}0 & -i \\ i &0 \end{bmatrix}\begin{bmatrix}1 \\ 0 \end{bmatrix} = \begin{bmatrix}0 \\ i \end{bmatrix} = i|1\rangle \\ &Y|1\rangle = \begin{bmatrix}0 & -i \\ i &0 \end{bmatrix}\begin{bmatrix}0 \\ 1 \end{bmatrix} = \begin{bmatrix}-i \\ 0 \end{bmatrix} = -i|0\rangle\\ \\ \\ Z &: |0\rangle \xrightarrow {\sigma_z} i|0\rangle \\ &Z|0\rangle = \begin{bmatrix}1 & 0 \\ 0 & -1 \end{bmatrix}\begin{bmatrix}1 \\ 0 \end{bmatrix} = \begin{bmatrix}0 \\ 1 \end{bmatrix} = |0\rangle \\ &Z|1\rangle = \begin{bmatrix}1 & 0 \\ 0 & -1 \end{bmatrix}\begin{bmatrix}0 \\ 1 \end{bmatrix} = \begin{bmatrix}0 \\ 1 \end{bmatrix} = -|1\rangle \\ \end{aligned}

Note that each of these are involutory, meaning that their squares are equal to the identity matrix. From these Pauli matrices, we can also generalize to arbitrary rotation operations:

\begin{aligned} R_x(\theta) &= \begin{bmatrix} \cos \frac{\theta}{2} & -i\sin\frac{\theta}{2} \\ -i\sin \frac{\theta}{2} & \cos\frac{\theta}{2} \end{bmatrix} \\ \\ R_y(\theta) &= \begin{bmatrix} \cos \frac{\theta}{2} & -\sin\frac{\theta}{2} \\ \sin \frac{\theta}{2} & \cos\frac{\theta}{2} \end{bmatrix} \\ \\ R_z(\theta) &= \begin{bmatrix} e^{-i\frac{\theta}{2}} & 0 \\ 0 & e^{i\frac{\theta}{2}} \end{bmatrix} \\ \end{aligned}

It should be clear(er) now that all qubit operations acting on an $n$ -dimensional quantum system map a point on the hypersphere to another point, which can always be achieved via some rotation, e.g.

\begin{aligned} H = \frac{1}{\sqrt{2}}(X + Z) &= \frac{1}{\sqrt{2}}\begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix} + \begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix} \\ &=\frac{1}{\sqrt{2}}\begin{bmatrix}1 & 1 \\ 1 & -1\end{bmatrix} \end{aligned}

so the Hadamard is just a rotation about the diagonal on the xz-plane of the Bloch sphere. Two applications of a rotation by $\pi$ will complete a full revolution about the sphere, returning to the original point.

Multi-qubit Systems

Recall that singular qubits occupy 2D Hilbert Spaces:

|\psi\rangle \in \mathcal H_A, \quad |\phi\rangle \in \mathcal H_B

The state vector of a composite system of two qubits is given by the tensor product of the two constituent state vectors being combined, where the tensor product $|\Psi\rangle$ is defined as follows:

\begin{aligned} |\Psi\rangle &= |\psi\rangle \otimes |\phi\rangle \in \mathcal H_A \otimes \mathcal H_B \equiv \mathcal H_{AB} \\ &= \begin{bmatrix} \alpha \\ \beta\end{bmatrix} \otimes \begin{bmatrix} \gamma \\ \delta \end{bmatrix} = \begin{bmatrix} \alpha\begin{bmatrix} \gamma \\ \delta \end{bmatrix} \\ \beta\begin{bmatrix} \gamma \\ \delta \end{bmatrix} \end{bmatrix} \\ &=\begin{bmatrix} \alpha\gamma \\ \alpha\delta \\ \beta\gamma \\ \beta\delta \end{bmatrix} \end{aligned}

In other words, the basis vectors of a 2 qubit system in $\mathcal H_{AB}$ are comprehensively enumerated:

\begin{aligned} |00\rangle = |0\rangle \otimes |0\rangle = \begin{bmatrix}1 \\ 0 \\ 0 \\0 \end{bmatrix} \\ \\ |01\rangle = |0\rangle \otimes |1\rangle = \begin{bmatrix}0 \\ 1 \\ 0 \\0 \end{bmatrix} \\ \\ |10\rangle = |1\rangle \otimes |0\rangle = \begin{bmatrix}0 \\ 0 \\ 1 \\0 \end{bmatrix} \\ \\ |11\rangle = |1\rangle \otimes |1\rangle = \begin{bmatrix}0 \\ 0 \\ 0 \\1 \end{bmatrix} \end{aligned}

In general, the tensor product of two matrices is given by:

\begin{aligned} A \otimes B &= \begin{bmatrix} \colorbox{yellow}{$a_{11}$} & \colorbox{green}{$a_{12}$} \\ \colorbox{cyan}{$a_{21}$} & \colorbox{magenta}{$a_{22}$} \\ \end{bmatrix} \otimes \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \\ \end{bmatrix} = \begin{bmatrix} \colorbox{yellow}{$a_{11}$}B & \colorbox{green}{$a_{12}$}B \\ \colorbox{cyan}{$a_{21}$}B & \colorbox{magenta}{$a_{22}$}B \\ \end{bmatrix} \\ \\ &=\begin{bmatrix} \colorbox{yellow}{$a_{11}b_{11}$} & \colorbox{yellow}{$a_{11}b_{12}$} & \colorbox{green}{$a_{12}b_{11}$} & \colorbox{green}{$a_{12}b_{12}$} \\ \colorbox{yellow}{$a_{11}b_{21}$} & \colorbox{yellow}{$a_{11}b_{22}$} & \colorbox{green}{$a_{12}b_{21}$} & \colorbox{green}{$a_{12}b_{22}$} \\ \colorbox{cyan}{$a_{21}b_{11}$} & \colorbox{cyan}{$a_{21}b_{12}$} & \colorbox{magenta}{$a_{22}b_{11}$} & \colorbox{magenta}{$a_{22}b_{11}$} \\ \colorbox{cyan}{$a_{21}b_{21}$} & \colorbox{cyan}{$a_{21}b_{22}$} & \colorbox{magenta}{$a_{22}b_{21}$} & \colorbox{magenta}{$a_{22}b_{22}$} \\ \end{bmatrix} \end{aligned}

And since the tensor product is distributive, we can also write a combination of operations in such a way is explicates the fact that each quantum operator is acting independently on the composite state space:

(A \otimes B)(|\psi\rangle \otimes |\phi\rangle) = A|\psi\rangle \otimes B|\phi\rangle

With this representation of multi-qubit systems, we can introduce some multi-qubit operations.

Multi-qubit Operations

Thus far, we've only defined single-qubit gates/operations, but we can also perform some actions on multiple qubits at a time, with a few crucial distinctions from classical logic gates. For starters, quantum gates must be reversible. Whereas it's entirely common for classical logic gates to map multiple inputs to a single binary output, since quantum systems model state changes in physical systems, the modifications made to each of the inputs must also be expressed in the output. The above description is phrased as a constraint, but it's really just a natural conclusion from the prior definitions about quantum operations being unitary & hermitian – which lets us simply apply the adjoint of an operation to "undo" it.

classiscal gate	reversible?
AND	sometimes
OR	no
NOT	yes
NAND	no
XOR	no

It should be straightforward to convince yourself that the input vector to these gates is not definitively knowable given the output bit/vector.

CNOT

The Controlled-NOT gate is analogous to the the classical XOR which can be thought of as addition $\mod 2$ given by the following truth table:

$\oplus$	0	1
0	0	1
1	1	0

Observe that, for a 2 qubit system, $q_1$ remains unchanged, and the state vectors of $q_2 \in \mathcal H_{AB}$ are swapped:

\begin{aligned} |00\rangle \mapsto |00\rangle \\ |01\rangle \mapsto |01\rangle \\ |10\rangle \mapsto |11\rangle \\ |11\rangle \mapsto |10\rangle \\ \end{aligned}

Algebraically, we can express this as the matrix:

U_{CNOT} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}

We can use CNOT to swap the states of two qubits the same way we might swap two classical variables' values without introducing a tmp value with three XORs:

$b_1$	$b_2$	$b_1 \oplus b_2$	$b_1 \oplus (b_1 \oplus b_2)$
0	0	0	0
0	1	1	1
1	0	1	0
1	1	0	1

so $b_1 \oplus (b_1 \oplus b_2) \equiv b_2$ .

and, in fact, we can represent any C-gate applied to an operation $U$ with this procedure:

CU = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & u_{00} & u_{01} \\ 0 & 0 & u_{10} & u_{11} \\ \end{bmatrix}

However, as we'll see in the from-scratch implementation of quantum circuitry, the construction of Control matrices becomes far, far more involved for higher order matrices with non trivial control & target qubits (the trivial case being the 0th qubit acting as the control, and the $n$ th qubit being the target).

Toffoli

The Toffoli gate is effectively a CNOT on a 3-qubit system:

\begin{aligned} Toff &: \mathcal H_{ABC} \rightarrow H_{ABC} \\ &= (q_1, q_2, q_3) \rightarrow (q_1, q_2, q_3 \oplus q_1q_2) \end{aligned}

The Toffoli truth table indicates that $q_1, q_2$ remain unchanged, and $q_3$ only changes if both $q_1 = q_2 = 1$ :

$q_1$	$q_2$	$q_3$	$q_1'$	$q_2'$	$q_3'$
0	0	0	0	0	0
0	0	1	0	0	1
0	1	0	0	1	0
0	1	1	0	1	1
1	0	0	1	0	0
1	0	1	1	0	1
1	1	0	1	1	1
1	1	1	1	1	0

Entanglement

Recall from the section about multi-qubit states the expression:

(A \otimes B)(|\psi\rangle \otimes |\phi\rangle) = A|\psi\rangle \otimes B|\phi\rangle

When a 2-qubit composite state function can be expressed as a tensor product of the constituent qubit states like this, then the composite state is said to be separable. E.g.

\begin{aligned} |\psi\rangle \otimes |\phi\rangle &= (\alpha |0\rangle + \beta |1\rangle) \otimes (\gamma |0\rangle + \delta |1\rangle) \\ &= \alpha\gamma|00\rangle + \alpha\delta|01\rangle + \beta\gamma|10\rangle + \beta\gamma|11\rangle \\ \\ \implies \alpha_{00} &= \alpha\gamma \\ \alpha_{01} &= \alpha\delta \\ \alpha_{10} &= \beta\gamma \\ \alpha_{11} &= \beta\delta \\ \\ \implies \frac{\alpha_{00}}{\alpha_{01}} &= \frac{\alpha_{10}}{\alpha_{11}} \\ 0 &= \alpha_{00}\alpha_{11} - \alpha_{10}\alpha_{01} \end{aligned}

For example, we can show that the following quantum state is separable:

\begin{aligned} |\psi\rangle &= \frac{1}{\sqrt 2} |0\rangle + \frac{1}{\sqrt 2} |1\rangle \in \mathcal H_A \\ |\phi\rangle &= |0\rangle \in \mathcal H_B \end{aligned}

The tensor product between $|\psi\rangle$ and $|\phi\rangle$ yields the following composite state:

|\psi\rangle \otimes |\phi\rangle = \frac{1}{\sqrt 2} |00\rangle + \frac{1}{\sqrt 2} |10\rangle \in \mathcal H_{AB}

And we can verify this is separable by checking if the product of the differences of amplitudes along the major and minor diagonals cancel each other out:

\begin{aligned} \alpha_{00} &= \frac{1}{\sqrt 2} \\ \alpha_{01} &= 0 \\ \alpha_{10} &= \frac{1}{\sqrt 2} \\ \alpha_{11} &= 0 \\ \\ \implies \alpha_{00}\alpha_{11} - \alpha_{10}\alpha_{01} &= 0 \\ \frac{1}{\sqrt 2}\cdot 0 - \frac{1}{\sqrt 2} \cdot 0 &= 0 \\ \end{aligned}

which shouldn't come as any surprise since this composite state was constructed from separated states via the tensor product :p. By means of counter example, let's take a look at a special quantum state in $\mathcal H_{AB}$ :

\begin{aligned} |\Phi^+\rangle = \frac{1}{\sqrt 2} |00\rangle &+ \frac{1}{\sqrt 2} |11\rangle\\ \\ \alpha_{00} &= \frac{1}{\sqrt 2} \\ \alpha_{01} &= 0 \\ \alpha_{10} &= 0 \\ \alpha_{11} &= \frac{1}{\sqrt 2} \\ \\ \implies \alpha_{00}\alpha_{11} - \alpha_{10}\alpha_{01} &= 0 \\ \frac{1}{\sqrt 2}\cdot \frac{1}{\sqrt 2} - 0 \cdot 0 &\neq 0 \\ \end{aligned}

Since the amplitudes don't cancel out, this state is inseparable or entangled. This quantum state $|\Phi^+\rangle$ is one of four maximally entangled systems referred to as the Bell States.

\begin{aligned} |\Phi^+\rangle &= \frac{1}{\sqrt 2} (|0\rangle_A \otimes |0\rangle_B) + \frac{1}{\sqrt 2} (|1\rangle_A \otimes |1\rangle_B) \\ &= \frac{1}{\sqrt 2} |00\rangle + \frac{1}{\sqrt 2} |11\rangle\\ \\ |\Phi^-\rangle &= \frac{1}{\sqrt 2} (|0\rangle_A \otimes |0\rangle_B) + \frac{1}{\sqrt 2} (|1\rangle_A \otimes |1\rangle_B) \\ &= \frac{1}{\sqrt 2} |00\rangle - \frac{1}{\sqrt 2} |11\rangle\\ \\ |\Psi^+\rangle &= \frac{1}{\sqrt 2} (|0\rangle_A \otimes |1\rangle_B) + \frac{1}{\sqrt 2} (|1\rangle_A \otimes |0\rangle_B) \\ &= \frac{1}{\sqrt 2} |01\rangle + \frac{1}{\sqrt 2} |10\rangle\\ \\ |\Psi^-\rangle &= \frac{1}{\sqrt 2} (|0\rangle_A \otimes |1\rangle_B) + \frac{1}{\sqrt 2} (|1\rangle_A \otimes |0\rangle_B) \\ &= \frac{1}{\sqrt 2} |01\rangle - \frac{1}{\sqrt 2} |10\rangle\\ \end{aligned}

and they can be constructed with some Hadamard and CNOT gates as follows:

construction of $|\Phi^+\rangle$

\begin{aligned} |\Phi^+\rangle &= \frac{1}{\sqrt 2}(|00\rangle + |11\rangle) \\ \\ &= U_{CNOT}(H \otimes I)(|0\rangle \otimes |0\rangle) \\ &= U_{CNOT}(|0\rangle \otimes |0\rangle) = U_{CNOT}(|00\rangle) \\ &= U_{CNOT}(H|0\rangle \otimes I|0\rangle) \\ &= U_{CNOT}(|+\rangle \otimes |0\rangle) =U_{CNOT}(\frac{1}{\sqrt 2}\begin{bmatrix}1 \\ 0 \\ 1 \\ 0 \end{bmatrix}) \\ &= \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}\frac{1}{\sqrt 2}\begin{bmatrix}1 \\ 0 \\ 1 \\ 0 \end{bmatrix} = \frac{1}{\sqrt 2}\begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \end{bmatrix} \\ \\ &= \frac{1}{\sqrt 2} \begin{pmatrix} \begin{bmatrix}1 \\ 0 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix}0 \\ 0 \\ 0 \\ 1 \end{bmatrix} \end{pmatrix} = \frac{1}{\sqrt 2}(|00\rangle + |11\rangle) \end{aligned}

construction of $|\Phi^-\rangle$

\begin{aligned} |\Phi^+\rangle &= \frac{1}{\sqrt 2} |00\rangle - \frac{1}{\sqrt 2} |11\rangle\\ \\ &= (I \otimes Z)|\Phi^+\rangle \\ &= (I \otimes Z)U_{CNOT}(H \otimes I)(|0\rangle \otimes |0\rangle) \\ \end{aligned}

construction of $|\Psi^+\rangle$

\begin{aligned} |\Psi^+\rangle &= \frac{1}{\sqrt 2} |01\rangle + \frac{1}{\sqrt 2} |10\rangle\\ \\ &= (I \otimes X)|\Phi^+\rangle \\ &= (I \otimes X)U_{CNOT}(H \otimes I)(|0\rangle \otimes |0\rangle) \\ \end{aligned}

construction of $|\Psi^-\rangle$

\begin{aligned} |\Psi^-\rangle &= \frac{1}{\sqrt 2} |01\rangle - \frac{1}{\sqrt 2} |10\rangle\\ \\ &= (I \otimes Z)|\Psi^+\rangle \\ &= (I \otimes Z)(I \otimes X)U_{CNOT}(H \otimes I)(|0\rangle \otimes |0\rangle)\\ \end{aligned}

But what does it mean to be maximally entangled? Before tackling this question, or motivating even why we care, it's time to really get our hands dirty.

Dir-Ac notation

Commonly referred to as "bra-ket" notation, but it's more fun and kind of conventional to summon up some new fangled notation to become one-up on your academic peers.

Thus far we should be relatively comfortable with kets which are equivalently expressible as vectors, but what the hell was Dirac smoking with those bras. First let's consider a linear function $\ell_x$ which maps a 2-dimensional vector to a scalar by just shitting out the $x$ component of its input:

\begin{aligned} \ell_x &: \mathbb R^2 \rightarrow \mathbb R \\ \ell_x\vec v &= v_x \\ \ell_x\begin{bmatrix} v_x \\ v_y\end{bmatrix} &= v_x \\ \end{aligned}

$\ell_x$ is indeed a linear map since it satisfies the following properties:

\begin{aligned} \ell_x (\vec u + \vec x) &= \ell_x\vec u + \ell_x\vec v \\ \ell_x (c\vec v ) &= c\ell_c\vec v \end{aligned}

Linear maps of this form which take an element of our vector space and shit out a scalar are called linear functionals. The algebraic representation of $\ell_x$ will be a 2x1 matrix, AKA a row vector:

\ell_x = \begin{bmatrix}1 & 0\end{bmatrix}

All linear functionals in $\mathbb R^2$ are two-dimensional row vectors. In other words, the set of all linear functions in $\mathbb R^2$ is the set of all row vectors of the form $\begin{bmatrix}a & b\end{bmatrix}$ , and thus this set forms its own vector space known as the dual space.

Formally, given a vector space $V$ , the dual space $V^*$ is the vector space of all linear functionals in $V$ . Linear functionals find natural relevance in quantum mechanics wherever we want to measure a quantum state which is the process of "collapsing" the quantum state into a single number, a process which resembles $\ell_x : \mathbb R^2 \rightarrow \mathbb R$ .

We denote linear functionals in quantum mechanics as those functions of the dual of the Hilbert space with "bras": $\langle \phi | \in \mathcal H^*$ , operating on kets. Note how when we smush a bra together with a ket, we get something that looks an awful lot like an inner product:

\begin{aligned} \langle \phi | |\psi\rangle &= c \\ \begin{bmatrix}1 & 0\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix} &= 1 \cdot a + 0 \cdot b \equiv \begin{bmatrix}1 \\ 0\end{bmatrix}\begin{bmatrix}a \\ b\end{bmatrix} \end{aligned}

any $\ell \in \mathbb R^2$ acts on any vector and can be expressed as a dot product by translating $\ell$ into a column vector courtesy of the Riesz Representation Theorem.⁹ I guess this is like significant if you're smart enough to have mentally bifurcated row vector/matrix multiplication as different than the dot product and thus this revelation may seem inobvious or maybe even haram. But, obviously, my brain was already rotating row vectors into column vectors (or maybe the other way around, idek), without realizing the profundity of being able to do so.

For any linear functional $\ell_\phi$ , the action of $\ell_\phi$ is equivalent to taking the inner product with some unique vector $\vec \phi$ :

\ell_\phi\vec v = \vec \phi \cdot \vec v

So, it's no mistake that:

\langle \phi||\psi\rangle \equiv \langle \phi|\psi\rangle

This is useful for increasingly many reasons (as I'm realizing). Take, for example, the orthonormal basis of the vector space of a quantum system to be:

|\mathcal B_1\rangle, |\mathcal B_2\rangle, |\mathcal B_3\rangle, ...

and an arbitrary qubit of the system as:

|\psi\rangle = \sum_i \alpha_i |\mathcal B_i\rangle

Since all of the states are orthonormal, we can compute their amplitude coefficients as:

\begin{aligned} \alpha_i &= \langle \mathcal B_i |\psi\rangle \\ \\ \implies|\psi\rangle &= \sum_i \langle \mathcal B_i |\psi\rangle |\mathcal B_i \rangle \\ &= \sum_i |\mathcal B_i\rangle \langle \mathcal B_i |\psi\rangle \\ &= \underbrace{\Big(\sum_i |\mathcal B_i\rangle \langle \mathcal B_i \Big)}_{I}|\psi\rangle \\ \end{aligned}

and for any orthonormal basis, this sum must be equal to the identity matrix e.g. for some arbitrary bases:

\begin{aligned} \mathcal B_1 = \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix}, \mathcal B_2 &= \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix}, \mathcal B_3 = \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \\ \\ \sum_i |\mathcal B_i\rangle \langle\mathcal B_i| &= \sum_i \colorbox{yellow}{$\mathcal B_i$} \otimes \mathcal B_i \\ &= \sum_i \begin{bmatrix} \colorbox{yellow}{$b_{1}$}b_{1} & \colorbox{yellow}{$b_{1}$}b_{2} & \colorbox{yellow}{$b_{1}$}b_{3} \\ \colorbox{yellow}{$b_{2}$}b_{1} & \colorbox{yellow}{$b_{2}$}b_{2} & \colorbox{yellow}{$b_{2}$}b_{3} \\ \colorbox{yellow}{$b_{3}$}b_{1} & \colorbox{yellow}{$b_{3}$}b_{2} & \colorbox{yellow}{$b_{3}$}b_{3} \\ \end{bmatrix} \\ \\ &=\begin{pmatrix} \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \otimes \begin{bmatrix}1 \\ 0 \\ 0\end{bmatrix} \end{pmatrix} + \begin{pmatrix} \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \otimes \begin{bmatrix}0 \\ 1 \\ 0\end{bmatrix} \end{pmatrix} + \begin{pmatrix} \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \otimes \begin{bmatrix}0 \\ 0 \\ 1\end{bmatrix} \end{pmatrix}\\ \\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \\ \end{bmatrix} + \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \\ \\ &= \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} = I \end{aligned}

which checks out since:

\begin{aligned} |\psi\rangle &= \Big(\sum_i |\mathcal B_i\rangle \langle \mathcal B_i \Big)|\psi\rangle \\ &\implies \Big(\sum_i |\mathcal B_i\rangle \langle \mathcal B_i \Big) = I \\ |\psi\rangle &= I|\psi\rangle \\ \end{aligned}

Measures of Entanglement

There are two measures of entanglement we'll consider. The first being the Positive Partial Transpose, and the second being a more comfy Von Neumann entropic measure.

Positive Partial Transpose

Let $|\Psi \rangle \in \mathcal H_{AB}$ , have a corresponding density matrix $\rho = |\Psi \rangle \langle \Psi|$ . The trace of $\rho$ (the sum of the eigenvalues, including duplicates), is given by:

Tr(\rho) = \sum_{j,k = 0, 1} \big(\langle j | \otimes \langle k|\big)|\Psi\rangle\langle \Psi|\big(| j\rangle \otimes |k\rangle\big)

which is the sum of the inner product with the basis vectors of the composite state. For an $n=2$ qubit system, we have 4 basis states, the trace of which is a scalar. The partial trace of a density matrix is the trace only on a part (or slice) of the Hilbert space, and yields what is known as the reduced density matrix:

\rho_A = Tr_B(\rho) = \sum_{k = 0, 1}\big(I \otimes \langle k|\big) |\Psi\rangle\langle \Psi| \big(I \otimes |k\rangle\big)

So, for example:

|\psi\rangle = \alpha_{00}|00\rangle + \alpha_{01}|01\rangle + \alpha_{10}|10\rangle + \alpha_{11}|11\rangle

the density matrix is:

\begin{aligned} \rho &= \sum_{i,j,k,l} \alpha_{ij}\alpha^*_{kl} |ij\rangle \langle kl| \\ \implies \rho_A &= \sum_{i,j,k} \alpha_{ij}\alpha^*_{kj} |i\rangle \langle k| \end{aligned}

we can see that the partial reduced density matrix collapses the outer product between the $j, l$ kets where the indices $j =l$ , thus the density matrix of $\rho_A$ is a 2x2, whereas $\rho$ is 4x4. We can recast $\rho_A$ in terms of the outer products of $\mu$ via substitution:

\begin{aligned} \rho_A &= \sum_j\Big(\sum_i \alpha_{ij}|i\rangle \Big)\Big(\sum_k \alpha^*_{kj}\langle k| \Big) \\ &= \sum_j |\mu_j\rangle \langle \mu_j| \\ &= \sum_j p_j|\tilde\mu_j\rangle \langle \tilde\mu_j| \\ \end{aligned}

where $\tilde\mu$ is just the normalized version of $\mu$ and

\begin{aligned} |\tilde\mu_j\rangle &= \sqrt p_j |\mu_j \rangle \\ p_j &= \sum |\alpha_{ij}|^2, \quad 0 \leq p_j \leq 1 \end{aligned}

so the trace of the square of the partial density matrix is then:

\begin{aligned} Tr(\rho_A^2) &= \sum_{k', k} =p_k \langle \tilde\mu_{k'} | \tilde\mu_{k}\rangle\langle \tilde\mu_{k} | \tilde\mu_{k'}\rangle \\ &= \sum_{k', k}p_k \delta_{kk'} = \sum_k p_k \\ &= \sum_k\sum_i |\alpha_{ik}|^2 = 1 \end{aligned}

In other words, $\rho_A$ is normalized, so:

\begin{aligned} Tr(\rho_A^2) &= \sum_{k', k} = p_k \langle \tilde\mu_{k'} | \Big(|\tilde\mu_k\rangle \langle\tilde\mu| \cdot |\tilde\mu_k\rangle \langle\tilde\mu|\Big) | \tilde\mu_{k'} \rangle \\ &= \sum_{k',k} p^2_k\delta_{kk'} = \sum_k p^2_k \\ &= \sum_k\Big( \sum_i |\alpha_{ik}|^2\Big)^2 \leq 1 \end{aligned}

And so we can use the trace of the square of a partial density matrix as a relevant and bounded measure of the degree of entanglement.

For example, the partial positive transpose measure of entanglement of a separable state:

\begin{aligned} |\Psi\rangle &= |\psi\rangle \otimes |\phi\rangle \in \mathcal H_{AB}\\ \\ \rho_A = Tr_B(\rho) &= \sum_{k = 0, 1} \big(I \otimes \langle k|\big) \big(|\psi\rangle \otimes | \phi\rangle\big) \big(\langle\psi| \otimes \langle \phi|\big) \big(I \otimes |k\rangle\big)\\ &= |\psi\rangle \langle\psi| \sum_{k=0,1} \langle k | \phi \rangle\langle k | \phi \rangle \\ &= |\psi\rangle \langle\psi| \\ \implies Tr(\rho_A^2) &= 1 \end{aligned}

Which means that we have exact information about the quantum system since the reduced density matrix is a pure state given by a single outer product instead of e.g. a mixture of outer products.

An impure, or mixed state for the partial density matrix of a non-separable state would have the form:

\begin{aligned} \rho_A &= |0\rangle\langle0| + |1\rangle\langle1| + \cdots \\ \\ \implies Tr(\rho_A^2) &< 1 \end{aligned}

Perfect mixed states are those where there is zero measurable information of the system, the partial density matrix of which are proportional to the identity matrix:

\begin{aligned} \rho_A &= \frac{1}{2}I \\ \\ \implies Tr(\rho_A^2) &= \frac{1}{2} \end{aligned}

implying that we have equal likelihood of measuring a qubit's state in any of the basis states. The bounds for our measure of entanglement, then, is:

\frac{1}{2} \leq Tr(\rho^2_A)\leq 1

An example of a maximally entangled state could be any of the Bell states:

\begin{aligned} |\Phi^+\rangle &= \frac{1}{\sqrt 2} |00\rangle - \frac{1}{\sqrt 2} |11\rangle\\ \\ \rho &= |\Phi^+\rangle \langle\Phi^+|\\ &= \frac{1}{2}\Big( |0\rangle \langle 0| \otimes |0\rangle \langle 0| + |0\rangle \langle 1| \otimes |0\rangle \langle 1| + |1\rangle \langle 0| \otimes |1\rangle \langle 0| + |1\rangle \langle 1| \otimes |1\rangle \langle 1| \Big)\\ \\ \rho_A = Tr_B(\rho) &= \frac{1}{2}\Big(|0\rangle \langle 0| + |1\rangle \langle 1| \Big)\\ \\ &= \begin{bmatrix} 1/2 & 0 \\ 0 & 1/2 \end{bmatrix} \implies \rho^2_A = \begin{bmatrix} 1/4 & 0 \\ 0 & 1/4 \end{bmatrix} \\ Tr(\rho_A^2) &= 1/2 \end{aligned}

This, if nothing else, is an illustrative example of how linear functionals behave, but is a rather painful measure of entanglement to compute by hand.¹⁰

Von Neumann Entropy

An equally valid measure of disorder of a quantum system is Von Neumann Entropy, defined as:

E = - \sum_x p(x)\log p(x)

For quantum systems $p(x)$ is equivalent to measuring the state vector $|\psi_i \rangle$ . We can show that Von Neumann entropy is equivalent to:

E(\rho) = - Tr(\rho \log \rho)

Working backwards from the classical definition, and recalling that the eigenvalues of $\rho$ correspond to the measurable quantities of our quantum system we replace $p(x) = p_i$ with the eigenvalues $\lambda_i$ :

E = - \sum_i \lambda_i \log \lambda_i

We can derive this result from the definition of the density matrix:

\rho = \sum_i p_i |\psi_i\rangle\langle\psi_i|

which is equivalent to the spectral decomposition of the matrix formed by the outer product within the sum above:

\rho = \sum_i \lambda_i |i\rangle\langle i|

This will be a diagonal matrix of the form

\begin{bmatrix} \lambda_1 & 0 & 0 & 0 \\ 0 & \lambda_2 & 0 & 0 \\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & \lambda_n \\ \end{bmatrix}

where $n$ specifies the dimensionality of the Hilbert space of the system, and the diagonal matrix is in the basis of $|i\rangle$ . Plugging this expression of $\rho$ back into the Von Neumann definition, the sum can be written as the $Trace$ operation:

\begin{aligned} E(\rho) &= - Tr\Bigg[\sum_i \lambda_i |i\rangle\langle i| \log \Big(\sum_j \lambda_j |j\rangle\langle j|\Big)\Bigg] \\ \\ &= - Tr\Bigg[\sum_i \lambda_i |i\rangle\langle i| \Big(\sum_j \log(\lambda_j) |j\rangle\langle j|\Big)\Bigg] \\ \\ &= - Tr\Bigg[\sum_i \sum_j \lambda_i \log(\lambda_j) |i\rangle\langle i |j\rangle\langle j|\Bigg] \\ \end{aligned}

Note that $\langle i | j\rangle$ is equivalent to the Kronecker delta:

\delta_{i,j} = \begin{cases} 1 &\text{ if } i =j\\ 0 &\text{ if } i \neq j\\ \end{cases}

\begin{aligned} E(\rho) &= - Tr\Bigg[\sum_i \sum_j \lambda_i \log(\lambda_j) |i\rangle\langle j|\Bigg] \\ \\ &= - \sum_i \lambda_i \log(\lambda_i) Tr\Big(|i\rangle\langle j|\Big) \\ \\ &= - \sum_i \lambda_i \log(\lambda_i) I \\ \\ &= - \sum_i \lambda_i \log(\lambda_i) \\ \end{aligned}

And, since we know that $\lambda \in [0, 1]$ since they correspond to the probabilities of measuring a given basic state, we can note that:

\begin{aligned} \lim_{\lambda \rightarrow 0} &-\lambda \ln \lambda = 0 \\ \lim_{\lambda \rightarrow 1} &-\lambda \ln \lambda = 0 \\ \lim_{\lambda \rightarrow \frac{1}{e}} &-\lambda \ln \lambda = 1 \\ \end{aligned}

And we can sanity check this measure of entanglement as we did with the fucked up one I presented above. For a pure, separable state, we have:

\begin{aligned} \rho &= |\psi\rangle\langle \psi | \\ \lambda_1 &= 1 \\ E(\rho) &= 0 \end{aligned}

And, for a finite¹¹ Hilbert space of dimension $d$ , Von Neumann entropy is upper bounded by $\log d$ .

For example, for the near-Bell state:

|\psi \rangle = \frac{1}{\sqrt 2} \Big[ |00\rangle - |10\rangle\Big] = \frac{1}{\sqrt 2} \begin{bmatrix}1 \\ 0 \\ -1 \\0\end{bmatrix}

The density matrix, and partial density matrices thereof will be:

\begin{aligned} \rho &= |\psi \rangle \langle\psi | \\ &= \frac{1}{2}\Big[|00\rangle\langle00| - |00\rangle\langle10| - |10\rangle\langle00| + |10\rangle\langle01|\Big] \\ \\ \rho_A &= Tr_B(\rho) \\ &= \frac{1}{2}\Big[|00\rangle\langle00| - |00\rangle\langle10| - |10\rangle\langle00| + |10\rangle\langle01|\Big] \\ &= \frac{1}{2}\begin{bmatrix}1 & -1 \\ -1 & 1\end{bmatrix} \\ &\implies \lambda_{\rho_A} = \{0, 1 \} \\ \\ \rho_B &= Tr_A(\rho) \\ &= \frac{1}{2}\Big[|00\rangle\langle00| - |00\rangle\langle10| - |10\rangle\langle00| + |10\rangle\langle01|\Big] \\ &= \frac{1}{2}\begin{bmatrix}1 & -1 \\ -1 & 1\end{bmatrix} \\ &\implies \lambda_{\rho_B} = \{0, 1 \} \\ \\ \implies E(\rho_A) = E(\rho_B) &= - Tr(\rho_A \log \rho_A) = - \sum_i \lambda_i \log \lambda_i \\ &= 0 \end{aligned}

¹²

Hidden Linear Function Problem

We'll consider now another problem¹³

Given a binary lattice $G = (V, E)$ , where $n = |V|$ , a binary vector $b \in \{0, 1\}^n$ , and an $n\times n$ adjacency matrix for $G$ : $A$ . $A$ is said to be sparse in the sense that $A_{ij} = 0$ unless $i,j$ correspond to neighboring vertices on the lattice: $|i - j| \leq 1$ :

A_{ij} = \begin{cases} 0 &\text{if } |i -j| > 1 \\ \\ 0 \text{ or }1 &\text{o.w.} \end{cases}

e.g. for $n = 4^2$ we might have a grid that resembles:

These inputs $A, b$ induce a quadratic form given by:

\begin{aligned} q(x) &= \sum_{1\leq i < j}^n 2A_{ij}x_ix_j + \sum_{i=1}^nb_ix_i \mod 4 \end{aligned}

where $x = [x_1,x_2,...,x_n] \in \mathbb F_2$ are binary values. A quadratic form is a polynomial where all terms are degree two, and note that since all our variables are binary-valued, $x_i^2 = x_i$ , so that second summation, though appearing to be degree one (and thus, not quadratic, is in fact equivalent to the sum of the squares of $b_ix_i$ ).

We further restrict $q(x)$ onto the linear subspace $\mathcal L_q$ given by:

\mathcal L_q = \Big\{x \in \{0,1\}^n : q(x \oplus y) = q(x) + q(y) \; \forall y \in \{0,1\}^n \Big\}

where $\oplus$ is arithmetic on the binary ring i.e. mod 2. It's also worth noting that, by design, $\mathcal L_q$ is the null space, or kernel of $A$ – that is:

Ax = 0 \forall x \in \mathcal L_q

Together, the definitions of $q$ and $\mathcal L_q$ imply that there exists at least one vector $z \in \{0,1\}^n$ s.t.

\begin{aligned} q(x) &= 2\sum_{i = 1}^n z_ix_i, \quad\forall x \in \mathcal L_q \\ \\ &= 2z^\top x \end{aligned}

We can find all the solutions classically by exhaustively checking all $y \in \{0,1\}^n$ like so:

import numpy as np

class HLF:
	"""
	A: a symmetric adjaceny matrix of "weights" for a lattice
	b: vertice "weights"
	"""

	def __init__(self, A, b):
		self.n = A.shape[0]
		assert A.shape == (self.n, self.n)
		assert b.shape == (self.n, )

		# symmetric
		assert np.array_equal(A, A.T)

	# only neighboring cells can be connected
	for i in range(self.n):
		for j in range(self.n):
			if A[i][j] == 1:
				# I am the smartest man alive. Google and MathStackExchange BTFO'd
				# I'd like to thank Daniel Shiffman and my whiteboard
				assert (abs(abs(i - self.n) - abs(j - self.n))) % 2 == 1

	self.A = A
	self.b = b

Note that this code is adapted/corrected from an example on Google's quantumai page, but they botch a few crucial aspects of the problem constraints.

A = \begin{bmatrix} 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{bmatrix}, \quad b = \begin{bmatrix} 0 & 0 & 1 & 0 \end{bmatrix}

which specifies $q$ :

\begin{aligned} q(x) &= \sum_{1\leq i < j}^n 2A_{ij}x_ix_j + \sum_{i=1}^nb_ix_i \mod 4\\ \\ &=2(x_1x_2 + x_1x_3) + x_3 \mod 4 \end{aligned}

and the induced subset is the rather unexciting:

\mathcal L_q = \begin{Bmatrix} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix} \end{Bmatrix}

which we can verify via the definition, enumerating over all of $\{0,1\}^n$ and enough LaTeX to typeset God himself.¹⁴

Now we just repeat that enumeration process above to find some $z$ which satisfies:

\begin{aligned} q(x) &= 2z^\top x, \quad \forall x \in \mathcal L_q \\ 2(x_1x_2 + x_1x_3) + x_3 &= 2z^\top x \mod 4 \\ (x_1x_2 + x_1x_3) + \frac{1}{2}x_3 &= z^\top x \mod 4 \end{aligned}

This trivially yields the solutions:

z \in \begin{Bmatrix} \begin{bmatrix}0\\0\\0\\0\end{bmatrix}, \begin{bmatrix}1\\0\\0\\0\end{bmatrix}, \begin{bmatrix}0\\1\\0\\0\end{bmatrix}, \begin{bmatrix}1\\1\\0\\0\end{bmatrix}, \begin{bmatrix}0\\0\\1\\0\end{bmatrix}, \begin{bmatrix}1\\0\\1\\0\end{bmatrix}, \begin{bmatrix}0\\1\\1\\0\end{bmatrix}, \begin{bmatrix}1\\1\\1\\0\end{bmatrix} \end{Bmatrix}

which we can programmatically verify against all $x \in \mathcal L_q$ .

But this solution requires $O(n)$ queries to $q(x)$ since we iterate over all permutations of of $\mathbb F^n_2$ multiple times to exhaust the search space.

Applying Quantum Circuits

Using a quantum approach, the output $z \in \{ 0,1\}^n$ can be sampled from the distribution:

p(z) = \Big|\langle z| H^{\otimes n} U_q H^{\otimes n} |0^{n} \rangle\Big|^2

where $p(z) > 0$ if $z$ is a solution. $U_q$ is constructed as:

\begin{aligned} U_q &= CZ(A)S(b) \\ CZ(A) = \prod_{1 \leq i < j}^n CZ_{ij}^{A_{ij}}, &\quad S(b) = \bigotimes_{j=1}^nS_j^{b_j} \end{aligned}

Notably, this can be accomplished with a constant-depth circuit:

CODE

Let's take a look at how we might implement this circuit from scratch. First, we know we'll need some representation of our gates, so I'll just throw em all into class as static fields:

class Gates:
	H = np.array([[1, 1],
				 [1, -1])) * 1/np.sqrt(2)

	X = np.array([[0, 1],
				 [1, 0]))

	Z = np.array([[1, 0],
				 [0, -1]))

	S = np.array([[1, 0],
				 [0, 0 - 1j]))

	I = np.array([[1, 0],
				 [0, 1]))

	T = np.array([[1, 0],
				 [0, np.exp(np.pi/4 * 1j)]))


	# |0⟩⟨0|
	proj0 = np.array([[1, 0],
					 [0, 0]))

	# |1⟩⟨1|
	proj1 = np.array([[0, 0],
					 [0, 1]))

We can apply these gates to some basic states like so:

c0 = 0 + 0j
c1 = 1 + 0j

ket0 = np.array([c1, c0])
ket1 = np.array([c0, c1])

print(f"X|0> = {Gates.X @ ket0}, X|1> = {Gates.X @ ket1}")
print(f"H|0> = {Gates.H @ ket0}, H|1> = {Gates.H @ ket1}")

X|0> = [0.+0.j 1.+0.j], X|1> = [1.+0.j 0.+0.j]
H|0> = [0.707+0.j 0.707+0.j], H|1> = [ 0.707+0.j -0.707+0.j]

and it will be useful to write a helper to plot them¹⁵ as vectors in the complex plane:

def plt_state_as_vecs(psi):
	ket_alpha, ket_beta = psi
	fig, axx = plt.subplots(1,2, figsize=(7,3))
	ax = axx[0]
	ax.axhline(0,ls='--',color='grey',alpha=0.2)
	ax.axvline(0,ls='--',color='grey',alpha=0.2)
	ax.plot(0, 0, color='blue',label=r'$\vert 0 \rangle$')
	ax.plot(0, 0, color='red',label=r'$\vert 1 \rangle$')
	ax.arrow(0,0,ket_alpha.real, ket_alpha.imag, head_width=0.05, head_length=0.1, color='blue')
	ax.arrow(0,0,ket_beta.real, ket_beta.imag, head_width=0.05, head_length=0.1,
	color='red')

	ax.set_xlim([-1,1])
	ax.set_ylim([-1,1])
	ax.set_xlabel("Real")
	ax.set_ylabel("Complex")
	ax.legend(loc=0)

	ax = axx[1]
	pr_zero, pr_one = np.absolute(psi[0])**2, np.absolute(psi[1])**2
	ax.bar([0,1],[pr_zero, pr_one])
	ax.set_xticks([0,1])
	ax.set_xticklabels([r'$\vert 0 \rangle$',r'$\vert 1 \rangle$'])
	ax.set_ylim(0,1)
	ax.grid(True)

	plt.tight_layout()

e.g. we can visualize how the Hadamard places a basic state into superposition:

plt_state_as_vecs(ket1)
plt_state_as_vecs(Gates.H @ ket1)

And we'll also want a generic plotting function to visualize the probability distribution over an arbitrarily-sized quantum system:

def plt_measure(out_register):
	n_qubits = int(np.log2(out_register.shape[0]))
	fig, ax = plt.subplots(1,1)
	ax.bar(range(2**n_qubits), basis_state_probs(out_register))
	ax.set_xticks(range(2**n_qubits))
	ax.set_xticklabels(stringify(n_qubits), rotation=45)
	ax.set_ylim(-0,1)
	ax.grid(True)
	ax.set_ylabel(r'$P(S_c)$')
	ax.set_xlabel(r'$S_c$')
	plt.show()

In order to construct an $n$ -dimensional system of qubits, we'll want to define how gates compose. This can be a static helper function of the Gate class defined above:

def Compose(gates):
	return ft.reduce(np.kron, gates, np.array([1]))

which produces a matrix on the order of the product of all dimensions of gates. Now, we can construct a system of four qubits and visualize it like so:

plt_measure(Gates.Compose([ket0, ket0, ket0, ket0]))

Or, non-graphically, we might just want to spit out the states and their corresponding likelihoods of observations:

def basis_state_probs(state_vector):
	return np.array([np.absolute(s)**2 for s in state_vector])

def stringify(n_qubits=3):
	state_strs = ['' for _ in range(2**n_qubits)]
	basis_strs = ['0', '1']

	for q in range(n_qubits):
		for i in range(len(state_strs)):
			b = basis_strs[((i//(2**q))) % 2]
			state_strs[i] = state_strs[i] + b
	return state_strs

out = Gates.Compose([ket0, ket0])
list(zip(stringify(out.shape[0]), basis_state_probs(out)))

which nicely displays the 2-qubit system initialized to the zero state:

[
	('0000', 1.0),
	('1000', 0.0),
	('0100', 0.0),
	('1100', 0.0)
]

We can take a stab at building that Toffoli circuit from earlier, noting that –even if an input qubit is not accessed or modified during an operation, we still need to maintain a uniform dimension of each moment of the circuit.¹⁶. Recall that the Toffoli gate relies on the CNOT matrix. Rather than just implementing a CNOT gate for $n=3$ qubits, though, we probably want to generalize a construction method for arbitrary $n$ . This is actually much easier said than done, but worth the effort.

Consider the general purpose 2x2 control gate defined by:

CU = |0\rangle\langle0| \otimes I + |1\rangle\langle1| \otimes U

\begin{aligned} CU_{1, 3} &= |0\rangle\langle0| \otimes I_{4\times4} + |1\rangle\langle1| \otimes (I \otimes U) \\ &= \begin{bmatrix} I_{4\times 4} & 0_{4\times 4} \\ 0_{4\times 4} & (I \otimes U) \\ \end{bmatrix}\\ &= \begin{bmatrix} I & 0_2 & 0_2 & 0_2 \\ 0_2 & I & 0_2 & 0_2 \\ 0_2 & 0_2 & U & 0_2 \\ 0_2 & 0_2 & 0_2 & U \\ \end{bmatrix} \end{aligned}

We can expand this expression and label each term for clarity:

\begin{aligned} CU_{1, 3} &= \underbrace{|0\rangle\langle0|}_{\text{control}} \otimes \underbrace{I}_{\text{uninvolved}} \otimes \underbrace{I}_{\text{target}} + \underbrace{|1\rangle\langle1|}_{control} \otimes \underbrace{I}_{\text{uninvolved}} \otimes \underbrace{U}_{\text{target}} \\ \\ &=\begin{bmatrix} I & 0_2 & & & \\ 0_2 & I & & & \\ & & & &\\ & & & & \end{bmatrix} + \begin{bmatrix} & & & &\\ & & & & \\ & & & U & 0_2 \\ & & & 0_2 & U \end{bmatrix} \end{aligned}

And if we swap the control and target indices, we can better understand the "role" of the uninvolved terms which effectively pad out the dimensionality:

\begin{aligned} CU_{3, 1} &= \underbrace{I}_{\text{target}} \otimes \underbrace{I}_{\text{uninvolved}} \otimes \underbrace{|0\rangle\langle0|}_{\text{control}} + \underbrace{U}_{\text{target}} \otimes \underbrace{I}_{\text{uninvolved}} \otimes \underbrace{|1\rangle\langle1|}_{\text{control}} \\ \\ &=\begin{bmatrix} |0\rangle\langle0| & & &\\ & |0\rangle\langle0| & & \\ & & |0\rangle\langle0| & \\ & & & |0\rangle\langle0| \end{bmatrix} + \begin{bmatrix} & & & |1\rangle\langle1| &\\ & & & & |1\rangle\langle1| \\ |1\rangle\langle1| & & &\\ & |1\rangle\langle1| & & &\\ \end{bmatrix} \end{aligned}

Thinking about the control gate in terms of matrix addition is the only way I was able to think about this problem.

# within Gate class again

def Control(n, U, control, target):
	"""
	This is the most cracked piece of software I've ever written.

	U: a unitary 2x2 matrix
	control: the index of the control qubit
	target: the index of the target qubit
	n: the number of qubits in the circuit

	returns CU(i, j): an NxN matrix constructed from some permutation of:

		[|0⟩⟨0| ⊗ I          ⊗ I]   + [|1⟩⟨1|  ⊗ I         ⊗ U]
		control, uninvolved, target + control, uninvolved, target

	s.t. `control` and `target` align with their respected indices, and the
	"uninvolved" identities are kroneckered to pad out the necessary dimensions
	"""
	left_ops = []
	for i in range(n):
		if i == control: 	left_ops += [Gates.proj0]
		else: 			 	left_ops += [Gates.I]

	right_ops = []
	for i in range(n):
		if i == control: 	right_ops += [Gates.proj1]
		elif i == target: 	right_ops += [U]
		else: 				right_ops += [Gates.I]

	left = ft.reduce(np.kron, left_ops, np.array([1]))
	right = ft.reduce(np.kron, right_ops, np.array([1]))

	return left + right

So now we can easily express the Toffoli as a sequence of matrix multiplications applied to a state vector:

g1 = Gates.Compose([Gates.I, Gates.I, Gates.H])
g2 = Gates.Compose([Gates.I, Gates.Control(2, Gates.X, 0, 1)])
g3 = Gates.Compose([Gates.I, Gates.I, dagger(Gates.T)])
# note that we don't need to `Compose` when our gate is already appropriately sized
g4 = Gates.Control(3, Gates.X, 0, 2)
g5 = Gates.Compose([Gates.I, Gates.I, Gates.T])
g6 = Gates.Compose([Gates.I, Gates.Control(2, Gates.X, 0, 1)])
g7 = Gates.Compose([Gates.I, Gates.I, dagger(Gates.T)])
g8 = Gates.Control(3, Gates.X, 0, 2)
g9 = Gates.Compose([Gates.I, Gates.T, Gates.T])
g10 = Gates.Compose([Gates.Control(2, Gates.X, 0, 1), Gates.H])
g11 = Gates.Compose([Gates.T, dagger(Gates.T), Gates.I])
g12 = Gates.Compose([Gates.Control(2, Gates.X, 0, 1), Gates.I])

gs = [g1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11, g12]
toffoli = g1 @ g2 @ g3 @ g4 @ g5 @ g6 @ g7 @ g8 @ g9 @ g10 @ g11 @ g12
in_states = [ket1, ket1, ket1]
out = register @ toffoli

immediately we can also observe that chaining together multiplications is dumb, so we can implement another static composition utility within our Gate class:

def compose_circuit(moments):
	n_qubits = int(np.log2(moments[0].shape[0]))
	return ft.reduce(np.dot, moments, np.identity(2**n_qubits))

and verify it works just the same via:

toffoli_ = Gates.compose_circuit(gs)
toffoli_ == toffoli

Let's apply these gates and composition methods to solve the Hidden Linear Function problem.

We'll instantiate the problem params:

A = np.array([
	[0, 1, 1, 0],
	[1, 0, 0, 0],
	[1, 0, 0, 0],
	[0, 0, 0, 0]
])
b = np.array([0, 0, 1, 0])
n = A.shape[0]

and define our circuit in terms of the moments described in our sketch from earlier:

# each moment needs to compose to an n x n gate
qubits = [ket0 for i in range(n)]

H1 = Gates.Compose([Gates.H for q in qubits])

CZs = []
for i in range(n):
	for j in range(n):
		if A[i][j] and i < j:
			CZs.append(Gates.Control(n, Gates.Z, i, j))

S_b = np.array([1])
for b_i in b:
	if b_i == 1: 	S_b = np.kron(S_b, Gates.S)
	else: 			S_b = np.kron(S_b, Gates.I)

H4 = Gates.Compose([Gates.H for q in qubits])
HLF = Gates.compose_circuit([H1, *CZs, S_b, H4])
reg = Gates.Compose(qubits)
out = reg @ HLF
for p_out in list(zip(stringify(n), basis_state_probs(out))):
	print(p_out)

plt_measure(out)

which shows us that each of the following are valid solutions:

which does indeed match our hand-computed choices of $z$ :

z \in \begin{Bmatrix} \begin{bmatrix}0\\0\\0\\0\end{bmatrix}, \begin{bmatrix}1\\0\\0\\0\end{bmatrix}, \begin{bmatrix}0\\1\\0\\0\end{bmatrix}, \begin{bmatrix}1\\1\\0\\0\end{bmatrix}, \begin{bmatrix}0\\0\\1\\0\end{bmatrix}, \begin{bmatrix}1\\0\\1\\0\end{bmatrix}, \begin{bmatrix}0\\1\\1\\0\end{bmatrix}, \begin{bmatrix}1\\1\\1\\0\end{bmatrix} \end{Bmatrix}

(note that the labels of the graph are in reverse order – exercise for the reader or whatever). We can doubly sanity check our solution by implementing the corrected instantiation of the problem using Google's cirq API:

def generate_cirquit(A, b):
	n = A.shape[0]
	qubits = cirq.LineQubit.range(n)
	circuit = cirq.Circuit()

	# Hadamards
	circuit += cirq.Moment([cirq.H(q) for q in qubits])

	# CZ
	for i in range(n):
		for j in range(n):
			if A[i, j] and i < j:
				circuit += cirq.CZ(qubits[i], qubits[j])

	# S
	circuit += cirq.Moment([cirq.S(qubits[i]) for i in range(n) if b[i))

	# Hadamards again
	circuit += cirq.Moment([cirq.H(q) for q in qubits])

	# Measurements
	circuit += cirq.Moment([cirq.measure(qubits[i], key=str(i)) for i in range(n)])

counts = {}
for _ in range(100):
	z = tuple(solve_problem(A, b, print_circuit=False))
	if z in counts: counts[z] += 1
	else:			counts[z] = 1

which outputs a sampled distribution over the same $z$ 's we just found.

HLF wrap up

Again, this problem is psychopathic because it only exists as a variant of the Bernstein-Vazirani problem to prove that quantum circuits can be used to solve classically exponential sized problems with a constant depth of quantum gates.

Bernstein-Vazirani Solution

We can now also easily compose & check a quantum implementation of the Bernstein-Vazirani algorithm. First, the cirq benchmark:

def generate_circuit_f(n, f=None):
	qubits = cirq.LineQubit.range(n+1)
	circuit = cirq.Circuit()

	# H
	circuit += cirq.Moment([cirq.H(q) for q in qubits])

	# U
	if f is None:
		f = np.random.randint(2, size=n)
	print(f"actual f: {f}")
	for i in range(len(f)):
		if f[i] == 1: circuit += cirq.CNOT(qubits[i], qubits[n])

	# H again
	circuit += cirq.Moment([cirq.H(q) for q in qubits])

	# Measurements
	circuit += cirq.Moment([cirq.measure(qubits[i], key=str(i)) for i in range(n+1)])

	return circuit

n = 5
sim = cirq.CliffordSimulator()
circ = generate_circuit_f(n)
print(circ)
result = sim.simulate(circ, initial_state=1)
f = np.array([result.measurements[str(i)][0] for i in range(n+1)])
print("computed f:", f)

which produces the following circuit & output:

actual f: [1 1 1 0 1]
0: ───H───@───────────────H───M('0')───
          │
1: ───H───┼───@───────────H───M('1')───
          │   │
2: ───H───┼───┼───@───────H───M('2')───
          │   │   │
3: ───H───┼───┼───┼───────H───M('3')───
          │   │   │
4: ───H───┼───┼───┼───@───H───M('4')───
          │   │   │   │
5: ───H───X───X───X───X───H───M('5')───
computed f: [1 1 1 0 1 1]

Real Problems Have No Solutions

Microsoft just patted themselves on the back so hard they dislodged a rib,¹⁷ announcing a new "topological" chip which offers a path to million-qubit systems. This number is significant since that's the order of magnitude required for Shor's algorithm to be remotely viable.¹⁸^,¹⁹
And even then, there are already provisions for "post-quantum" crypto.²⁰

Footnotes

Accounting for cosmic ray bit flips on board Voyager-2 ↩
This is a base joke. ↩
Explain Like I’m 25 And I Don’t Know Any Linear Algebra ↩
Q.v. Hadamard code in astronomical error correcting codes ↩
Q.v. Geometric Algebra ↩
Not me tho, I'm a certified rotator ↩
Ethan Bernstein and Umesh Vazirani. "Quantum Complexity Theory." SIAM Journal on Computing Vol. 26, No. 5. 1997. ↩
that's not to say that complex values have no "real world" significance. I'd be loath to perpetuate the notion of "imaginary" numbers being just that – it's just that, for our intents and purposes lol, the complex components are imaginary, negligible, and we just need some guard rails to ensure this model doesn't fly off the rails to silly town aka imaginaryville. ↩
Riesz, F. (1909). "Sur les opérations fonctionnelles linéaires". Comptes rendus de l'Académie des Sciences. Iss 149: 974–977. ↩
One of the passive observations I've made about quantum mechanics after putting together all this information is that it is incredibly tedious to trace any sort of sequence of operations on any quantum system with $n >3$ qubits since the size of the matrices involved is $2^n$ and quickly gets untenable. So, while at first the notation seems spooky and foreign, it grows on you rather quickly after scratching out a few 16x16 matrix multiplications... ↩
oh yeah, I should mention that there are infinitely dimensional quantum systems ↩
we take $0 \log 0 = 0$ for convenience, which is consistent in the limit $\lambda_i \rightarrow 0^+$ . ↩
Sergey Bravyi et al. "Quantum advantage with shallow circuits." ArXiv, April 2017. ↩
literally had to drop this stupid enumeration because it was too much latex to render on the web ↩
Appropriated from @cjbayesian ↩
https://quantumai.google/cirq/build/circuits ↩
https://news.microsoft.com/source/features/ai/microsofts-majorana-1-chip-carves-new-path-for-quantum-computing/ ↩
David Aasen et al. "Roadmap to fault tolerant quantum computation using topological qubit arrays." ArXiv, February 2025. ↩
Gidney, Craig and Ekereå Martin. "How to factor 2048 bit RSA integers in 8 hours using 20 million noisy qubits." ArXiv, April 2021. ↩
FIPS 203. Module-Lattice-Based Key-Encapsulation Mechanism Standard. ↩

Useless Solutions Call for Useless Problems

Tags

Previous Article

Next Article