Notes on Information Theory

Published on
174 min read––– views

Notes on Information Theory from various sources

Index

Lectures

1 | Introduction to Coding Theory

  • Shannon's seminal book “A Mathematical theory of communication” (1948) measured information content in the output of a random source in terms of its entropy
  • In his book, he presented the Noiseless Coding Theorem: given nn independent and identically distributed random variables, each with entropy H(x)H(x)...
    • H(x)H(x) can be compressed into n(H(x)+ϵ)n(H(x) + \epsilon) bits with negligible loss
    • or alternatively compressed into n(H(x)ϵ)n(H(x) - \epsilon) bits entailing certain loss
    • Given a noisy communication channel whose behavior is given by a stochastic channel law and considering a message of kk bits output by a source coder, the theorem states that every such channel has a precisely defined real numbered capacity that quantifies the maximum rate of reliable communication
    • Given a noisy channel with a capacity CC, if information is transmitted at rate RR (meaning k=nRk=nR message bits are transmitted in nn uses of the channel), then if R<CR < C, there exist coding schemes (encoding and decoding pairs) that guarantee negligible probability of miscommunication, whereas if R>CR > C, then –regardless of a coding scheme– the probability of an error at the receiver is bounded below some constant (which increases with RR)
      • Conversely, when R>CR > C, the probability of miscommunication goes exponentially to 1 in kk

1 - A Simple Code

Suppose we need to store 64-bit words in such a way that they can be recovered even if a single bit per word gets flipped

  • We could duplicate each bit at least three times, thus storing 21 bits of information per word, and limiting ourselves to roughly 1/3 of the information per word. But the tradeoff is that we could correct any single bit flip since the majority value of the three copies gives te correct value of the bit if one of them is flipped.

In 1949, Hamming proferred his code to correct 1-bit errors using fewer redundant bits. Chunks of 4 information bits x1,x2,x3,x4x_1, x_2, x_3, x_4 get mapped to (encoded) to a codeword of 7 bits as

x1,x2,x3,x3,x2x3x4,x1x3x4,x1x2x4 x_1, x_2, x_3, x_3, x_2 \oplus x_3 \oplus x_4, x_1 \oplus x_3 \oplus x_4, x_1 \oplus x_2 \oplus x_4

which can be equivalently represented as a mapping from xx to GxGx (operations do mod2\mod 2 ) where xx if the column vector [1,2]T[1, 2]^T and GG is the matrix

[1000010000100001011110111101] \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 \\ \end{bmatrix}
  • Two distinct 4-bit vectors x,yx, y get mapped to the codewords Gx,GyGx, Gy which differ in the last 3 bits (define w=xy,w0w = x - y, w \neq 0 and check that, for each non-zero ww, GwGw has at least 3 bits which are 11)
  • This code can correct all single bit flips since, for any 7-bit vector, there is at most 1 codeword which can be obtained by a single bit flip
  • For any y{0,1}7y \in \{0, 1\}^7 which is not a codeword, there is always a codeword which can be obtained by a single bit flip

Basic Definitions

Hamming Distance

The Hamming Distance between two strings x,yx,y of the same length over the alphabet Σ\Sigma is denoted Δ(x,y)\Delta(x,y) and defined as the number of positions at which xx and yy differ:

Δ(x,y)={ixiyi}\Delta(x,y) = |\{ i | x_i \neq y_i\}|

The fractional or relative Hamming Distance between x,yΣnx,y \in \Sigma^n is given by

δ(x,y)=Δ(x,y)n\delta(x,y) = \frac{\Delta(x,y)}{n}
  • The distance of an error correcting code is the measurement of error resilience quantified in terms of how many errors need to be introduced to cofuse one codeword with another
  • The minimum distance of a code CC denoted Δ(C)\Delta(C) is the minimum Hamming distance between two distinct codewords of CC:
Δ(C)=minc1,c2Cc1c2Δ(c1,c2) \Delta(C) = \min\limits_{{c_1, c_2 \in C \atop c_1 \neq c_2}} \Delta(c_1, c_2) \\
  • For every pair of distinct codewords in CC, the Hamming Distance is at least Δ(C)\Delta(C)
  • The relative distance of CC denoted δ(C)\delta(C) is the normalized quantity Δ(C)n\frac{\Delta(C)}{n} where nn is the block length of CC. Thus, any two codewords of CC differ in at least a fraction of σ(C)\sigma(C) positions
  • For example, the parity check code, which maps kk bits to k+1k+1 bits by appending the parity of the message bits, is an example of a distance 2 code with rate k/(k+1)k/(k+1)

Hamming Weight

The Hamming Weight is the number of non-zero symbols in a string xx over alphabet Σ\Sigma

w(x)={ixi0}w(xy)=Δ(x,y)\begin{aligned} w(x) &= |\{ i | x_i \neq 0 \}| \\ w(x-y) &= \Delta(x, y) \end{aligned}

Hamming

A Hamming Ball is given by the radius rr around a string xΣnx \in \Sigma^n given by the set

{yΣnΔ(x,y)r}\{y \in \Sigma^n | \Delta(x,y) \leq r\}

Code

An error correcting code CC of length nn over a finite alphabet Σ\Sigma is a subset of Σn\Sigma^n.

  • Elements of CC are codewords in CC, and the collection of all the codewords is sometimes called a codebook.
  • The alphabet of CC is Σ\Sigma, and if Σ=q|\Sigma| = q, we say that CC is qq-ary (binary, ternary, etc.)
  • The length nn of codewords in CC is called the block length of CC
  • Associated with a code is also an encoding map EE which maps the message set M\mathcal M, identified in some canonical way with {1,2,...,C}\{1, 2, ..., |C| \} to codewords in Σn\Sigma^n
    • The code is then the image of the encoding map

Rate

The rate of a code is defined by the amount of redundancy it introduces.

R(C)=logCnlogΣnR(C) = \frac{\log |C|}{n \log |\Sigma^n|}

which is the amount of non-redundant information per bit in codewords of CC.

  • The dimension of CC is given by logClogΣn\frac{\log |C|}{\log |\Sigma^n|}
  • A qq-ary code of dimension \ell has qq^\ell codewords

Distance

  • The distance of an error correcting code is the measurement of error resilience quantified in terms of how many errors need to be introduced to confuse one codeword with another
  • The minimum distance of a code CC denoted Δ(C)\Delta(C) is the minimum Hamming distance between two distinct codewords of CC:
Δ(C)=minc1,c2Cc1c2Δ(c1,c2) \Delta(C) = \min\limits_{{c_1, c_2 \in C \atop c_1 \neq c_2}} \Delta(c_1, c_2) \\
  • For every pair of distinct codewords in CC, the Hamming distance is at least Δ(C)\Delta(C)
  • The relative distance of CC denoted δ(C)\delta(C) is the normalized quantity Δ(C)n\frac{\Delta(C)}{n} where nn is the block length of CC. Thus, any two codewords of CC differ in at least a fraction of σ(C)\sigma(C) positions
  • For example, the parity check code, which maps kk bits to k+1k+1 bits by appending the parity of the message bits, is an example of a distance 2 code with rate k/(k+1)k/(k+1)

Properties of Codes

  1. CC has a minimum distance of 2t+12t + 1
  2. CC can be used to correct all tt symbol errors
  3. CC can be used to detect all 2t2t symbol errors
  4. CC can be used to correct all 2t2t symbol erasures – that is, some symbols are erased and we can determine the locations of all erasures and fill them in using filled position and the redundancy code

An Aside about Galois Fields

  • A finite or Galois Field is a finite set on which addition, subtraction, multiplicaiton, and division are defined and correspond to the same operations on rationals and reals
  • A field is denoted as Fq\mathbb F_q or GFq\mathbf {GF}_q where qq is a the order or size of the field given by the number of elements within
    • A finite field of order qq exists if and only if qq is a prime power pkp^k, where pp is prime and kk is a positive integer
    • In a field of order pkp^k, adding pp copies of any element always results in zero, meaning tat the characteristic of the field is pp
    • All fields of order q=pkq = p^k are isomorphic, so we can unambiguously refer to them as Fq\mathbb F_q
  • Fq\mathbb F_q, then, is the field with the least number of elements and is said to be unique if the additive and multiplicative identities are denoted 0,10, 1 respectively
  • It should be pretty familiar for bitwise logic then as these identities correspond to modulo 2 arithmetic and the boolean XOR operation:
±\pm01
001
110
XORAND
±01
001
110
x01
000
101
  • Other properties of binary fields
    • Every element xx of F2\mathbb F_2 satisfies x+x=0;  x=xx + x = 0; \;-x = x
    • Every element xx of F2\mathbb F_2 satisfies x2=0x^2 = 0

3 - Linear Codes

General codes may have no structures, but of particular interest to us are a subclass of codes with an additional structure, called Linear Codes defined over alphabets Σ\Sigma which are finite fields Fq\mathbb F_q where qq is a prime number in which case Fq={0,1,...,q1}\mathbb F_q = \{0, 1, ..., q-1\} with addition and multiplication defined modulo qq.

  • Formally, if Σ\Sigma is a field and CΣnC \subset \Sigma^n is a subspace of Σn\Sigma^n then CC is said to be a Linear code
  • As CC is a subspace, there exists a basis c1,c2,...,ckc_1, c_2, ..., c_k where kk is the dimension of the subspace. Any codeword can be expressed as the linear combination of these basis vectors. We can write these vectors in matrix form as the columns of an n×kn \times k matrix call a generator matrix

Generator Matrices and Encoding

  • Let CFqnC \subseteq \mathbb F_q^n be a linear code of dimension kk. A matrix GFqn×kG \in \mathbb F^{n \times k}_q is said to be a generator matrix for CC if its kk columns span CC
  • It provies a way to encode a message xFqkx \in \mathbb F^k_q (a column vector) as the codeword which is a linear transformation from xx to GxGx
  • Note that a Linear Code admits many different generator matrices, corresponding to the different choices of bases for the code as a vector space
  • A qq-ary code block of length nn and dimension kk will be referred to as an [n,k]q[n, k]_q code
    • If the code has distance dd, it may be expressed as [n,k,d]q[n, k, d]_q and when the alphabet size is obvious or irrelevant, the subscript may be ommitted

Aside about Parity-Bit Check Codes

A simple Parity-bit Check code uses an additional bit denoting the even or odd parity of a messages Hamming Weight which is appended to the message to serve as a mechanism for error detection (but not correction since it provides no means of determing the position of error).

  • Such methods are inadeqaute for multi-bit errors of matching parity e.g., 2 bit flips might still pass a parity check, despite a corrupted message
dataHamming WeightCode (even parity)odd
000000000000000000000000000000\color{red}{0}1\color{red}{1}
101000110100013101000111010001\color{red}{1}0\color{red}{0}
110100111010014110100101101001\color{red}{0}1\color{red}{1}
111111111111117111111111111111\color{red}{1}0\color{red}{0}

Application

  • Suppose Alice wants to send a message m=1001m=1001
  • She computes the parity of her message peven(m)=1+0+0+0+1(mod2)=0p_{even}(m) = 1+ 0 + 0 + 0 + 1 (\mod 2) = 0
  • And appends it to her message: m=mp(m)=10010m' = m | p(m) = 10010
  • And transmits it to Bob AmBA \xrightarrow{m'} B
  • Bob recieves mm' with questionable integrity and computes the parity for himself: 1+0+0+0+1+0(mod2)=01+ 0 + 0 + 0 + 1 + 0(\mod 2) = 0 and observes the expected even-parity result
  • This same process works under and odd-parity scheme as well

Examples of Codes

  • [n,n1,2]2[n, n-1, 2]_2 corresponds to the binary parity check code consisting of all vectors in F2n\mathbb F^n_2 of even Hamming Weight
  • [n,1,n]2[n, 1, n]_2 is a binary repetition code consisting of the two vectors 0n,1n0^n, 1^n
  • [7,4,3]2[7,4,3]_2 is the linear Hamming Code discussed above
  • If HH is the parity check matrix of the Linear Code CC, then Δ(C)\Delta(C) is the minimum number of columns of HH that are Linearly Dependent

Hamming Codes Revisited

  • Hamming Codes can be understood by the structure of its parity check matrix which allows us to generalize them to codes of larger lengths
  • CHamC_\text{Ham} = [7,4,3]2[7,4,3]_2 is a Hamming Code using the Generator Matrix GG, and a parity matrix HH
G=[1000010000100001011110111101],H=1    2    3    4    5    6    7[000111101100111010101] G = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 \\ \end{bmatrix}, \qquad H = \begin{array}{rcl} &\begin{array}{c} \color{red}1 \ \ \ \ 2 \ \ \ \ 3 \ \ \ \ 4 \ \ \ \ 5 \ \ \ \ 6 \ \ \ \ 7 \end{array} \\ &\begin{bmatrix} 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ \end{bmatrix} \end{array}

and see that HG=0HG = \mathbf 0.

Correcting single errors with Hamming codes

  • Suppose that yy is a corrupted version of some unknown codeword cCc \in C with a single error (bit flip)
  • We know that by the distance property of CHamC_\text{Ham}, cc is uniquely determined by yy. We could determine cc by flipping each bit of yy and check if the resulting vector is in the null space of HH,
    • Recall that the nullspace of a matrix AA is the set of all n-dimensional column vectors xx such that Ax=0Ax = \mathbf 0
  • However, a more efficient correction can be performed. We know that y=c+eiy = c + e_i, where eie_i is the column vector of all zeros except for a single 1 in the iith position. Note that Hy=H(c+ei)=Hc+Hei=Hy = H(c+ e_i) = Hc + He_i = the iith column of HH which is the binary representation of ii, and thus this recovers the location ii of the error
  • Syndrome: We say that HyHy is the syndrome of yy

Generalized Haming Codes

We define HrH_r to be the r×2r1r \times 2^r - 1 matrix where column ii of HrH_r is the binary representation of ii. This matrix must contain e1e_1 through ere_r, which are binary representations of all powers of 2 from 1,2r11, 2^{r-1} and thus has full row rank

  • The rrth generalized Hamming Code then is given by
CHam(r)={cF22r1Hrc=0}C^{(r)}_\text{Ham} = \{ c \in \mathbb F^{2^r-1}_2 | H_rc = \mathbf 0\}

and is the binary code with the parity check matrix HrH_r

  • Note that CHam(r)C^{(r)}_\text{Ham} is an [2r1,2r1r,3]2[2^r -1, 2^r - 1 - r, 3]_2 code
  • Proof:
    • Since HrH_r has rank r,dim(CHam(r))=rr, \text{dim}(C^{(r)}_\text{Ham})=r, and we must simply check that no two columns of HrHr are linearly dependent, and that there are three linearly dependent columns in HrH_r
    • The former follows since all the columns are distinct, and the latter since the first three columns of HH are the binary representations of 1,2,3 add up to 0.

If CC is a binary code of block length nn, and has a minimum distance of 3, then

C2nn+1(1)\tag{1} |C| \leq \frac{2^n}{n+1}

and it follows that the Hamming Code CHam(r)C^{(r)}_\text{Ham} has the maximum possible number of code words (and thus, an optimal rate) amongst all binary codes of block length 2r12^r -1 and a minimum distance of at least 3.

  • Proof:
    • For each cCc \in C, define its neighbordhood N(c)={y{0,1}nΔ(y,c)1}N(c) = \{y \in \{0,1\}^n | \Delta(y,c) \leq 1 \} of all strings that differ in at most one bit from cc.
    • Since CC has distance 3, by the triangle inequality we have
N(c)N(c)=  for  ccCN(c) \cap N(c') = \emptyset \; \text{for} \; c \neq c' \in C \therefore
2ncCN(c)=cCN(c)=C(n+1)2^n \geq \Big|\bigcup_{c \in C} N(c) \Big| = \sum_{c \in C} |N(c)| = |C| \cdot (n+1)

yielding the desired upper bound on C|C| per this “Hamming Volume” upper bound on size of codes.

  • Note that CHam(r)C^{(r)}_\text{Ham} has a dimension of 2r1r2^r-1-r and therefore a size 22r1/2r2^{2^r -1}/2^r which meets the upper bound for block length n=2r1n = 2^r - 1
  • The general definition of a Hamming or Volume Bound for binary codes CC of block length nn, and distance dd is given by
C2ni=0d12(ni)(2)\tag{2} |C| \leq \frac{2^n}{\sum_{i=0}^{\lfloor \frac{d-1}{2} \rfloor}{n \choose i}}

for equality to hold in (2), Hamming Balls of radius d12\lfloor \frac{d-1}{2} \rfloor around the code words must cover each point in {0,1}n\{0,1\}^n exactly once, giving a perfect packing of non-overlapping Hamming spheres that cover the full space.

  • Codes which meet the bound in (2) are therefore called perfect codes

Perfect Codes

There are few perfect binary codes:

  • The Hamming Code CHam(r)C^{(r)}_\text{Ham}
  • The Golay Code
  • The trivial codes consistng of just 1 codeword, or the whole space, or the repetition code {0n,1n}\{0^n, 1^n\} for odd nn

The Golay Code

Like other Hamming Codes, the Golay Code can be constructed in a number of ways. Gol23\text{Gol}_{23} consists of (c0,c1,...,c22){0,1}23(c_0, c_1, ..., c_{22}) \in \{0,1\}^{23} when the polynomial c(X)=c0X0+c1X1+...+c22X22c(X) = c_0X^0 + c_1X^1 + ... +c_{22}X^{22} is divisible by

g(X)=1+X+X5+X6+X7+X9+X11g(X) = 1 + X + X^5 + X^6 + X^7 + X^9 + X^11

in the ring F[X]/(X231)\mathbb F[X]/(X^{23}-1).

3.3 Dual of a Code

Since a Linear Code is a subspace of Fqn\mathbb F_q^n, we can define its dual in orthogonal space – a co-code, if you will

  • Dual Code: If CFqnC \subseteq \mathbb F_q^n is a linear code, then its dual CC^\perp is a linear code over over Fq\mathbb F_q given by
C={zFqnzc=0  cC}C^\perp = \{ z \in \mathbb F_q^n | \langle z \cdot c \rangle = 0 \; \forall c \in C \}

where z,c=i=1nzici\langle z, c \rangle = \sum^n_{i=1} z_ic_i is the dot product over Fq\mathbb F_q of vectors z,cz,c

  • Unlike vector spaces over R\mathbb R, where the dial (or orthogonal complement) vector WW^\perp of a subspace wRnw \subseteq \mathbb R^n satisfies WW={0}W \cap W^\perp = \{ \mathbf 0 \} and (W+W+RnW + W^\perp + \mathbb R^n), for subspaces of Fqn\mathbb F^n_q, CC and CC^\perp can intersect non-trivially
  • In fact, we can devise CCC^\perp \subseteq C (a self-orthogonal code) or even just C=CC = C^\perp, a self-dual

3.4 Dual of the Hamming Code

The Dual of the Hamming Code CHam(r)C^{(r)}_\text{Ham} has a generator matrix Gr=(Hr)TG_r = (H_r)^T which is a (2r1)×r(2^r-1)\times r matrix whose rows are all non-zero bit vectors of length rr. This yields a [2r1,r]2[2^r-1, r]_2 code and is called a simplex code

The Hadamard Code, the Hamming Code's dual, is obtained by adding an all-zeros row to GrG_r

Hadr\text{Had}_r is a [2r,r]2[2^r, r]_2 linear code whose 2r×r2^r \times r generator matrix has all rr-bit vecotrs as its rows.

  • Its encoding map encodes xF2rx \in \mathbb F_2^r by a string in F22rF_2^{2^r} consisting of the dot product x,a\langle x,a \rangle for every aFqka \in \mathbb F^k_q
  • The hadamard code can also be dinfed over Fq\mathbb F_q, buby encoding a message in Fqk\mathbb F_q^k with its dot product with ever other vector in that field
  • The Hadamard Code is the most redundant linear code in which no two codeword symbols are equal in every codeword, implying a robust distance property:
    • The Hadamard Code and simplex codes have a minimum distance of 2r12^{r-1}. The qq-ary Hadamard Code of dimension rr has a distance (11q)qr(1-\frac{1}{q})q^r
    • Proof: For x0,z,a0x\neq0, \langle z, a \rangle \neq 0 for exactly 2r12^{r-1} (i.e. half) of the elemtns of aF2ra \in \mathbb F_2^r.
    • Assume for definiteness that x1=1x_1=1.
    • Then, for every a,x,a+x,a+e1=x1=1a, \langle x, a \rangle + \langle x, a+e_1 \rangle = x_1 = 1, and therefore exactly one of x,a\langle x, a \rangle and x,a+ei\langle x, a + e_i\rangle equals 1. The proof for the qq-ary case$ is similar.
  • Binary codes cannot have a relative distance δ(c)\delta(c) of more than 12\frac{1}{2}, unless they only have a fixed number of code words. This the relative distance of Hadamard Codes is optimal, but their rate is necessarily poor

Code Families and Asymptotically Good Codes

  • The Hamming and Hadamard codes exhibit two extreme trade-offs between rate and distance
    • Hamming Code's rates appraochs 1 (optimal), but their distance is only 3
    • Conversely, Hadamard Code's have an optimal relative distance of 12\frac{1}{2}, but their rate approaces 0
  • The natural question that follows is is there a class of codes which have good rate and good relative distance such that neither approach 0 for large block lengths?
  • To answer this question, we consider asymptotic behavior of code families:
    • Define a code family C\mathcal C to be an infinite collection {CiiN}\{ C_i | i \in N \}, where CiC_i is a qiq_i-ary code of block length nin_i, with ni>ni1n_i > n_{i-1} and qi>qi1q_i > q_{i-1}
    • Many constructions of codes will belong to an infinite family of codes that share general structure and properties whoe asymptotic behavior guides their construction
    • We also need extend the notion of relative rate and distance to these inifinite families:
R(C)=limikini\mathcal {R(C)} = \lim_{i \rightarrow \infty } \frac{k_i}{n_i}
δ(C)=limiΔ(Ci)ni\delta\mathcal {(C)} = \lim_{i \rightarrow \infty } \frac{\Delta(C_i)}{n_i}
  • A qq-ary family of codes is said to be asymptotically good if its rate and relative distance are bounded away from zero such that there exist constant R0>0,δ0>0  s.t.R(C)>R0,δ(C)>δ0\mathcal{R_0> 0, \delta_0 >0 \; s.t. R(C) > R_0, \delta(C) > \delta_0}

While a good deal is known about the existence or even explicit construction of good codes as well as their limitations, the best possible asymptotic trade-off between rate and relative distance for binary codes remains a fundamental open question which has seen no improvement since the late 70s in part due to the the fundamental trade off between large rate and large relative distance

Error Correction Coding: Mathematical Methods and Algorithms - Todd Moon

1 | A Context for Error Correction Coding

1.2 - Where Are Codes

  • Error Control Codes are comprised of error correction and error detection

Da heck is an ISBN

  • For a valid ISBN, the sum-of-the-sum of the digits must be equal to 0mod110 \mod 11
0country20publisher136186book #8check\underbrace{0}_{\text{country}} - \underbrace{20}_{\text{publisher}} - \underbrace{1 - 36186}_{\text{book \#}} - \underbrace{8}_{check}
digitcum. sumcum. sum of sum
000
222
024
137
3613
61225
11338
82159
62786
835121

The final sum is 121=0mod11121 = 0 \mod 11

1.3 - The Communication System

Information theory treats information as a pseudo-physical quantity which can be measured, transformed, stored, and moved from place to place. A fundemental concept of information theory is that information is conveyed by the resolution of uncertainty.

  • Probabilities are used to described uncertainty. For a discrete random variable X{0,1}X \in \{0, 1\}, the information conveyed by observing an outcome xx is log2P(X=x)-\log_2 P(X=x) bits.

    • If P(X=1)=1P(X=1)=1, outcome of 1 is certain, then observing X=1X=1 yields log2(1)=0- \log_2(1) = 0 bits of information.
    • Conversely, observing X=0X=0 in this case yields log2(0)=- \log_2(0) = \infty; a total surprise at observing an impossible outcome
  • Entropy is the average information. For our above example with a discrete, binary, random variable XX, we have either H2(X)H_2(X) (entropy of the source) or H2(p)H_2(p) (a function of the outcome probabilities) given by

H2(X)=H2(p)=E[log2P(X)]=plog2(p)(1p)log2(1p) bits H_2(X) = H_2(p) = \mathbb E[- \log_2 P(X)] = -p \log_2(p) - (1-p) \log_2(1 -p) \text{ bits}
  • For a source XX having NN outcomes x1,x2,...,xNx_1, x_2, ..., x_N with probabilities P(X=xi)=pi,i=1,2,...,NP(X=x_i) = p_i, i = 1, 2, ..., N, the entropy is
H(x)=E[log2P(X)]=i=1Npilog2pi bitsH(x) = \mathbb E [-\log_2 P(X)] = \sum_{i=1}^N p_i \log_2 p_i \text{ bits}
  • Source-Channel Separation Theorem: Treating the problems of data compression and error correction separately, rather than seeking a jointly optimal source/channel coding solution, is asymptotically optimal in block sizes

  • Because of redundancy introduced by the channel coder, there must be more symbols at the output of the coder than at the input. A channel coder operates by accepting a block of kk input symbols and outputting a block of n>kn > k symbols.

  • The Rate of a channel coder is R=k/n<1R = k/n < 1

  • Channel Coding Theorem: Provided that the rate RR of transmission is less than the channel capacity CC, there exists a code such that the probability of error can be made arbitrarily small

1.4 - Binary Phase-Shift Keying

  • Process of mapping a binary sequence to a waveform for transmission over a channel
  • Given a binary sequence {...,b2,b1,b0,b1,b2,...}\{..., b_{-2}, b_{-1}, b_{0}, b_{1}, b_{2}, ... \}, we map the set {0,1}{1,1}\{0, 1\} \rarr \{-1, 1\} via
b~i=(2bi1) or b~i=(2bi1)\tilde{b}_i = (2b_i - 1) \text{ or }\tilde{b}_i = -(2b_i - 1)
  • We also have ai=Eb(2bi1)=Eb(1)bi=Ebb~ia_i = \sqrt{E_b}(2b_i -1) = -\sqrt{E_b}(-1)^{b_i} = \sqrt{E_b}\tilde{b}_i, a mapping of a bit bib_i or (b~i\tilde{b}_i) to a signal amplitude which multiplies a waveform φ1(t)\varphi_1(t) which is a unit-energy signal
φ1(t)2dt=1\int^\infty_{-\infty} \varphi_1(t)^2dt = 1

over [0,T)[0, T). A bit bib_i arriving at iTiT can be represented as the signal aiφ1(tiT)a_i\varphi_1(t - iT) where the energy required to transmit the bit bib_i is given by

(aiφ1(t))2dt=Eb\int^\infty_{-\infty} (a_i \varphi_1(t))^2 dt = E_b
  • The transmitted signal ±Ebφ1(t)\pm \sqrt{E_b}\varphi_1(t) are points ±Eb\pm \sqrt{E_b} in a (for our purposes: one-) dimensional signal space on the unit-energy φ1\varphi_1 axis

such that a sequence of bits [1,1,0,1,0][1,1,0,1,0] can be expressed as a waveform:

We can expand this representation to two dimensions by introducing another signal φ2(t)\varphi_2(t) and modulating the two to establish a signal constellation. Let M=2mM = 2^m, for some inteeger mm, be the number of points in a constellation. MM-ary transmission is obtained by placing MM points S={(a1k,a2k),k=0,1,...,M1}\mathcal{S} = \{(a_{1k}, a_{2k}), k = 0, 1, ..., M-1 \} in this signal space and assigning each point a unique pattern of mm bits.

  • Note that the assignment of the bits to constellation points is such that adjacent points differ by only 1 bit. Such an assignment is called Gray code order. Since it is most probably that errors will move from a point to an adjacent point, this reduces the probability of bit error

Associated with each signal (stream of kk bits) sk(t)=a1kφ1(t)+a2kφ2(t)s_k(t) = a_{1k}\varphi_1(t) + a_{2k}\varphi_2(t) and signal constellation point (a1k,a2k)S(a_{1k},a_{2k}) \in \mathcal S is a signal energy

Ek=(sk(t))2dt=a1k2+a2k2E_k = \int^\infty_{-\infty} (s_k(t))^2 dt = a_{1k}^2 + a_{2k}^2

The average signal energy EsE_s is the average of all signal energies, assuming that each point is used with equal probability

Es=1Mk=0M1(sk(t))2dt=k=0M1(a1k2+a2k2)E_s = \frac{1}{M} \sum_{k=0}^{M-1} \int^\infty_{-\infty} (s_k(t))^2 dt = \sum_{k=0}^{M-1} (a_{1k}^2 + a_{2k}^2)

and ca be related to the average energy per bit

Eb=energy per signalbits per signal=EsmE_b = \frac{\text{energy per signal}}{\text{bits per signal}} = \frac{E_s}{m}

2 | Groups and Vector Space

2.1 - Introduction

Linear block codes form a group and a vector space, which is useful

2.2 - Groups

A Group formalizes some basic rules of arithmetic necessary for cancellation and solution of simple equations

A binary operation * on a set is a rule that assigns to each ordered pair of elements of a set a,ba,b some element of the set

  • Since the operation is deinfed to return an element of the set, this operation is closed

  • We can further assume that all binary operations are closed

  • Examples include: min(a,b),  fst(a,b),  a+b\min(a, b), \; fst(a,b), \; a + b

A Group G,\langle G, * \rangle is a set GG with a binary operation * on GG such that

  • The operator is associative: for any a,b,cG  (ab)c=a(bc)a, b, c \in G\; (a * b) * c = a * (b * c)
  • For all aGa\in G there exists bGb \in G which is the inverse of aa such that ab=ea * b = e, where ee is called the identity of the group. The inverse can be denoted a1,aa^{-1}, -a for multiplication-like and addition-like operations, respectively

When * is obvious from context, the group may be referred to as just the set GG.

If GG has a finite number of elements, it is said to be a finite group with order G|G|, the number of elements.

A group GG is commutative if ab=ba;  a,bGa * b = b * a; \; \forall a, b \in G.

  • Z,+\langle \Z, + \rangle, the integers under addition, forms a group with the identity 00 since a+0=0+a;  aZa + 0 = 0 + a; \; \forall a \in \Z

    • The invert of aZ=aa \in \Z = -a. Thus the group is commutative.
    • Groups with an addition-like operator are called Abelian
  • Z,\langle \Z, \cdot \rangle, the integers under multiplication, does not form a group since there does not exist a multiplicative inverse for every element of the integers.

  • Q\{0},\langle \mathbb Q \backslash \{0\}, \cdot \rangle, the rationals without zero under multiplication with the identity of 11 and the inverse a1=1aa^{-1} = \frac{1}{a} forms a group

These rules are strong enough to introduce the notion of cancellation. In a group GG, if ab=aca * b = a * c, then b=cb = c by left cancellation via

a1(ab)=a1(bc)    (a1a)c=ec    =c\begin{aligned} & a^{-1} * (a * b) = a^{-1} (b * c)\\ \implies & (a^{-1} * a) * c = e * c\\ \implies &= c \\ \end{aligned}

and similarly for right hand cancellation via properties of associativity and identity.

We can also identify that solution to linear equations of the form ax=ba *x = b are unique.

  • Immediately, we get a=a1ba = a^{-1} b. If x1,x2x_1, x_2 are the two solutions such that
ax1=b=ax2a * x_1 = b = a * x_2

then by cancellation, we get x1=x2x_1 = x_2.

For example, let Z5,+\langle \Z_5, + \rangle be addition mod5\mod 5 on the set {0,1,2,3,4,5}\{0, 1, 2, 3, 4, 5 \} with the identity 00.

++0\mathbf 01\mathbf 12\mathbf 23\mathbf 34\mathbf 4
0\mathbf 00011223344
1\mathbf 11122334400
2\mathbf 22233440011
3\mathbf 33344001122
4\mathbf 44400112233
  • Since 00 appears in each row and column, every element has an inverse.
  • By uniqueness of solution, we must have every element appearing in every row and column, which we do. So Z5,+\langle \Z_5, + \rangle is a Abelian group.

One way to form groups is via the Cartesian production. Given groups G1,,G2,,G3,,..Gr,\langle G_1, * \rangle, \langle G_2, * \rangle, \langle G_3, * \rangle, .. \langle G_r, * \rangle, the direct product group G1×G2×...×GrG_1 \times G_2 \times ... \times G_r has elements (a1,a2,...,ar)(a_1, a_2, ..., a_r) where aiGia_i \in G_i. The operation is performed elementwise such that if

(a1,a2,...,ar)G(a_1, a_2, ..., a_r) \in G

and

(b1,b2,...,br)G(b_1, b_2, ..., b_r) \in G

then

(a1,a2,...,ar)(b1,b2,...,br)=(a1b1,a2b2,...,arbr)(a_1, a_2, ..., a_r) * (b_1, b_2, ..., b_r) = (a_1 * b_1, a_2 * b_2, ..., a_r * b_r)

Example

The group Z2×Z2,+\langle \Z_2 \times Z_2, + \rangle consists of the tuplies and addition mod2\mod 2.

++(0,0)(\mathbf {0, 0})(0,1)(\mathbf {0, 1})(1,0)(\mathbf {1, 0})(1,1)(\mathbf {1, 1})
(0,0)(\mathbf {0, 0})(0,0)(0, 0)(0,1)(0, 1)(1,0)(1, 0)(1,1)(1, 1)
(0,1)(\mathbf {0, 1})(0,1)(0, 1)(0,0)(0, 0)(1,1)(1, 1)(1,0)(1, 0)
(1,0)(\mathbf {1, 0})(1,0)(1, 0)(1,1)(1, 1)(0,0)(0, 0)(0,1)(0, 1)
(1,1)(\mathbf {1, 1})(1,1)(1, 1)(1,0)(1, 0)(0,1)(0, 1)(0,0)(0, 0)

This is called the Klein 4-group, and intorduces the concept of permutations as elements of a group, and is a composable function as our operation as opposed to simple arithmetic groups.

Example

A permutation of set AA is a bijection (one to one, and onto) of AA onto itself. Let

A={1,2,3,4}A = \{ 1,2,3,4 \}

then a permutation

p1=(12343413)p_1 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 1 & 3 \end{pmatrix}

meaning

p1=(12343421)p_1 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ \downarrow & \downarrow & \downarrow & \downarrow \\ 3 & 4 & 2 & 1 \end{pmatrix}

There are n!n! different permutations on nn distinct elements. We can think of p1p_1 as a prefix operation: p11=3p_1 \circ 1 = 3 or p14=1p_1 \circ 4 = 1. If

p2=(12344312)p_2 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 4 & 3 & 1 & 2 \end{pmatrix}

then

p2p1=(12343421)(12344312)=(12341234)=ep_2 \circ p_1 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 2 & 1 \end{pmatrix} \circ \begin{pmatrix} 1 & 2 & 3 & 4 \\ 4 & 3 & 1 & 2 \end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \end{pmatrix} = e

is the construction of permutations which is closed under the set of permutations with the identity ee and the inverse

p11=(12343412)=p2p_1^{-1} = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 1 & 2 \end{pmatrix} = p_2

Composition of permutations is associative: for

p1,p2,p3;(p1p2)p3=p1(p2p3)p_1, p_2, p_3; \\ (p_1 \circ p_2) \circ p_3 = p_1 \circ (p_2 \circ p_3)

and thus the set of all n!n! permutations on nn elements forms a group under composition.

  • This is known as a symmetric group on nn letters SnS_n.
  • Note that composition is not commutative since p1p2p2p1p_1 \circ p_2 \neq p_2 \circ p_1, so S4S_4 is a non-commutative group

2.2.1 Subgroups

A subgroup H,\langle H, * \rangle of GG is a group formed from a subset of elements of GG where H<GH < G indicates the relation between the two sets where HGH \subset G is said to be a proper subgroup.

The subgroups H\{e}GH \backslash \{e\} \subset G and H=GH = G are the trivial subgroups.

Example

Let G=Z6,+,H={0,2,4},+G = \langle \Z_6, + \rangle, H = \langle \{ 0, 2, 4\}, +\rangle, both mod6\mod 6. We can show that HGH \subset G is a group: K={0,3},+K = \langle \{0, 3\}, + \rangle.

Similarly, Z,+<Q,+<R,+<C,+\langle \Z, + \rangle < \langle \mathbb Q, + \rangle < \langle \mathbb R, + \rangle < \langle \mathbb C, + \rangle

Example

Consider the group of permutations on S4S_4, which has a subgroup formed by the closed compositions:

p0=(12341234),  p1=(12342341),p2=(12343412),  p3=(12344123),p4=(12342143),  p5=(12344321),p6=(12343214),  p7=(12341432)p_0 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 3 & 4 \end{pmatrix}, \; p_1 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 2 & 3 & 4 & 1 \end{pmatrix}, \\ p_2 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 1 & 2 \end{pmatrix}, \; p_3 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 4 & 1 & 2 & 3 \end{pmatrix}, \\ p_4 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 2 & 1 & 4 & 3 \end{pmatrix}, \; p_5 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 4 & 3 & 2 & 1 \end{pmatrix}, \\ p_6 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 3 & 2 & 1 & 4 \end{pmatrix}, \; p_7 = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 4 & 3 & 2 \end{pmatrix} \\

2.2.2 Cyrclic Groups and the Order of an Element

In a group GG with a multiplication-like operation, we use ana^n to indicate

aa...an times\underbrace{a * a * ... * a}_{n \text{ times}}

with a0=ea^0=e being the identity element for GG.

For a group with additive operations nana is used to indicate

a+a+...+an times\underbrace{a + a + ... + a}_{n \text{ times}}

If aGa \in G, any subgroup containing aa must also include a2,a3a^2, a^3, etc.

It must also adhere to e=aa1,an=(a1)ne = aa^{-1}, a^{-n} = (a^{-1})^n.

For any aGa \in G, the set {annZ}\{a^n | n \in \Z\} generates a subgroup of GG called the cyclic subgroup. aa is said to be the generator of the subgroup and the resultant group is denote a\langle a \rangle.

If every element of a group can be gneerated from a single element, the group is said to be cyclic.

Example

Z5,+\langle \Z_5, + \rangle is cyclice since every element can be generated by a=2a = 2.

2=22+2=42+2+2=12+2+2+2=32+2+2+2+2=0\begin{aligned} 2 &= 2 \\ 2 + 2 &= 4 \\ 2 + 2 + 2 &= 1 \\ 2 + 2 + 2 + 2 &= 3 \\ 2 + 2 + 2 + 2 + 2 &= 0 \\ \end{aligned}

In a group GG with aGa \in G, the smallest nn such that ana^n is equal to the identity in GG is said to be the order of the element aa.

  • If no such nn exists, aa is of infinite order.
  • In Z5\Z_5, 2 is of order 55, as are all non-zero elements of the set.

Example

If G=Z6,+G = \langle \Z_6, + \rangle, then

2={0,2,4},3={0,3},5={0,1,2,3,4,5}\begin{aligned} \langle 2 \rangle &= \{ 0, 2, 4\}, \\ \langle 3 \rangle &= \{ 0, 3\}, \\ \langle 5 \rangle &= \{ 0, 1, 2, 3, 4, 5 \} \end{aligned}

Therefore aZ6a \in \Z_6 is a generator for the group if and only if aa and 66 are relatively prime

[IMAGE Of the planes]

2.3 - Cosets

The left coset of H,H<GH, H < G (GG not necessarily being commutative), aHa * H is the set

{ahhH}\{a * h | h \in H \}

and the right coset HaH * a is given by the symmetric relation. In a commutative group, they are obviously equal.

Let GG be a group, HH a subgroup of GG, and the left coset aHGa * H \in G. Then, baHb \in a *H if and only f b=ah,hHb = a * h, h \in H. By cancellation, we have a1bHa^{-1}*b \in H.

To determine if a,ba,b are in the same (left) coset, we simply check the above condition.

Exmaple

Let G=Z,+G = \langle \Z, + \rangle and S0=3Z={...,6,3,0,3,6,...}S_0 = 3 \Z = \{ ..., -6, -3, 0, 3, 6, ...\} such that S0<GS_0 < G. Now, form the cosets:

S1=S0+1={...,5,2,1,4,7,...}S2=S0+2={...,4,1,3,5,8,...}\begin{aligned} S_1 &= S_0 + 1 = \{ ..., -5, -2, 1, 4, 7, ...\} \\ S_2 &= S_0 + 2 = \{ ..., -4, -1, 3, 5, 8, ...\} \\ \end{aligned}

and observe that neither S1,S2S_1, S_2 are subgroups as they do not have an identity, but the union of all three cover the original group: G=S0S1S2G = S_0 \cup S_1 \cup S_2.

We can determine whether or not two numbers a=4,b=6a =4, b =6 are in the same coset of S0S_0 by checking whether

(a)+bS0a+b=2S0(-a) + b \in S_0 \\ -a + b = 2 \notin S_0

2.2.4 - Lagrange's Theorem

Basically, it's a prescription of subgroup size relative to its group.

  • Every coset of HH in a group GG has the same number of elements
  • Let (ah1),(ah2)aH(a * h_1), (a * h_2 )\in a * H be two elements ub tge ciset aHa * H. If they are equal, then we have h1=h2h_1 = h_2, and thereform the elements of a coset are uniquely identified by elements in HH

It follows, then, that we have the following properties of cosets:

  • reflecivity: An element aa is in the same coset of itself
  • symmetry: If a,ba,b are in the same coset, then b,ab,a are in the same coset
  • transitivity: If a,ba,b are in the same coset, and b,cb,c are in the same coset, then a,ca, c are in the same coset as well

Therefore, the relation of "being in the same coset" is an equivalence relation, and thus, every equivalence relation partitions its elements into disjoint sets.

  • It follows then that the distinct coets of HH in a group GG are disjoint

With these properties and lemmas, Lagrange's Theorem states that: If GG is a group of finite order, and HH a subgroup of GG, then the order of HH divides the order of GG such that GmodH=0|G| \mod |H| = 0. We say aba | b "aa divides bb" without remainder

  • If G<|G| < \infty and H<GH < G, then HG|H| \big| |G|

Lagrange's Theorem implies that every group of prime order is cyclic.

  • Let GG be of prime order, with aGa \in G and e=idGe = \mathbf {id}_G. Let H=a H = \langle a \rangle, the cyclic subgroup generated by aa. Then a,eHa,e \in H. But by Lagrange's Theorem, the order of HH must ("cleanly") divide the order of GG. Since G|G| is prime, we must have H=G|H| = |G|, thus aa generates GG, so GG must be cyclic.

2.2.5 - Induced Operations; Isomorphism

Returning to the three sentence cosets we defined earlier, we have S={S0,S1,S2}S = \{S_0, S_1, S_2 \} and define addition on SS as follows: for

A,B,CS;A+B=CA, B, C \in S; \\ A + B = C

if and only if a+b=c;  a,b,cA,B,Ca + b = c; \; \forall a, b, c \in A, B, C. That is, addition of the sets is defined by representatives in the sets. The operation is said to be the induced operations on the coset: S1+S2=S0S_1 + S_2 = S_0.

Taking 1S1,2S21 \in S_1, 2 \in S_2 and noting that 1+2=3S01 + 2 = 3 \in S_0. Similarly, S1+S1=S2S_1 + S_1 = S_2 via 1+1=21 + 1 = 2, taking the representatives to be 1S11 \in S_1. Based on this induced operation, we can obtain the following table

++S0S_0S1S_1S2S_2
S0S_0S0S_0S1S_1S2S_2
S1S_1S1S_1S2S_2S0S_0
S2S_2S2S_2S0S_0S1S_1

which we say is \cong to the table for Z3,+\langle \Z_3, + \rangle:

++001122
00001122
11112200
22220011

Two groups G,,G,\langle G, * \rangle, \langle \mathcal G, \diamond \rangle are said to be (group) isomorphic if there exists a one-to-one, onto function ϕ:GG\phi: G \rarr \mathcal G called the isomorphism such that

a,bG,  ϕ(ab)operation in G=ϕ(a)ϕ(b)operation in G\forall a, b \in G, \; \underbrace{\phi(a * b)}_{\text{operation in } G} = \underbrace{\phi(a) \diamond \phi(b)}_{\text{operation in } \mathcal G}

We denote isomorphism via GGG \cong \mathcal G. Isomorphic groups are essentially "the same thing"

Let G,\langle G, * \rangle be a group, HH a subgroup, and S={H0=H,H1,H2,...,Hm}S = \{ H_0 = H, H_1, H_2, ..., H_m\} the set of cosets in GG. The induced operation between cosets A,BSA,B \in S is defined by AB=CA * B = C if and only if ab=ca * b = c for any a,b,cA,B,Ca, b, c \in A,B,C provided that the operation is well-defined (no ambiguity for those operands).

For any commutative group, the induced operation is always well-defined, otherwise only for normal groups

  • A subgroup HH of GG is normal if g1Hg=H  gGg^{-1}Hg = H \; \forall g \in G. All Abelian groups are normal.

If HH is a subgroup of a commutative group ,G,\langle, G, * \rangle the induced operation * on the set of cosets of HH satisfies

(ab)H=(aH)(bH)(a * b) * H = (a * H) * (b * H)

The group formed by the cosets of HH in a commutative (or normal subgroup) group GG with the induced operation is said to be the factor group of GmodHG \mod H, denoted G/HG / H and the cosets are called the residue classes of GmodHG \mod H

  • We could express Z3Z6/H0,  Z/3ZZ3\Z_3 \cong \Z_6/H_0, \; Z/3\Z \cong Z_3 and in general Z/nZZn\Z / n\Z \cong \Z_n.

A Lattice is formed by taking all the possible linear combination of a set of basis vectors. That is [v1,v1,...,v1][\mathbf{v_1}, \mathbf{v_1}, ..., \mathbf{v_1}] are a set of linearly independent vectors. A lattive is then formed from the basis by

Λ={vz:zZn}\varLambda = \{ v\mathbf{z} : \mathbf{z} \in \Z^n \}

e.g. the lattice formed by V=[1001]V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} is the set of points with the integer coorinates in the plane denoted Z2\Z^2.

For the lattice Λ=Z2\varLambda = \Z^2, let Λ=2Z2\varLambda' = 2\Z^2 be a subgroup with the cosets

S0=Λ(),  S1=(1,0)+Λ()S2=(0,1)+Λ(),  S3=(1,1)+Λ()\begin{aligned} S_0 = \varLambda (\cdot), \; &S_1 = (1, 0) + \varLambda' (\circ) \\ S_2 = (0, 1) + \varLambda' (\diamond), \; &S_3 = (1, 1) + \varLambda' (\star) \\ \end{aligned}

and Λ/ΛZ2×Z2\varLambda / \varLambda' \cong \Z_2 \times \Z_2

2.2.6 - Homomorphism

A homomorphism is a weaker condition than the structural and relational equivalence defined by an isomorphism. A homorphism exists when sets have when sets have the same algebraic structure, but they might have a different number of elements.

The groups G,,G,\langle G, * \rangle, \langle \mathcal G, \diamond \rangle are said to be homomorphic if there exists a function. (not necessarily one to one) ϕ:GG\phi: G \rarr \mathcal G called the homomorphism such that:

ϕ(ab)=ϕ(a)ϕ(b)\phi(a * b) = \phi(a) \diamond \phi(b)

Example

Let G=Z,+,G=Zn,+G = \langle \Z, + \rangle, \mathcal G = \langle \Z_n, + \rangle, and ϕ(a)=amodn\phi(a) = a \mod n. For a,bZa, b \in \Z, we have

ϕ(a+b)=ϕ(a)+ϕ(b)\phi(a + b) = \phi(a) + \phi(b)

This Z,Zn\Z, \Z_n are homomorphic though they clearly have different orders.

Let G,\langle G, * \rangle be a commutative group, HH a subgroup, so that G/HG / H is the factor group. Let ϕ:GG/H\phi: G \rarr G/H be defined by ϕ(a)=aH\phi(a) = a * H, then ϕ\phi is a homomorph known as the natural or canonical homomorphism.

The kernel of a homomorphism ϕ\phi of a group GG into a group G\mathcal G is the set of all elements of GG which mapped to idG\mathbf{id}_\mathcal{G} by ϕ\phi.

2.3 - Fields

A Field F,+,\langle \mathbb F, + , \cdot \rangle is a set of of objects F\mathbb F on which addition, multiplication, and their inverse operations apply arithmetically (analagously to their behavior on the reals). The operations of a field must satisfy the following conditions:

  1. Closure under addition: a,b,F,  a+bF\forall a, b, \in \mathbb F, \; a + b \in \mathbb F
  2. There is an additive identity: aF,  eF  s.t.  a+e=a+e=a\forall a \in \mathbb F, \; \exist e \in \mathbb F \; \text{s.t.} \; a + e = a + e = a which we denote 00
  3. There is an additive inverse (subtraction): aF,  bF s.t. a+b=b+a=0\forall a \in \mathbb F, \; \exist b \in \mathbb F \text{ s.t. } a + b = b + a = 0 denoted a=b-a = b
  4. Additive associativity: a,b,cF,  (a+b)+c=a+(b+c)\forall a,b,c \in \mathbb F, \; (a + b) + c = a + (b + c)
  5. Additive Commutativity: a,bF;a+b=b+a\forall a,b \in \mathbb F\, ; a + b = b + a
  6. Closure under multiplication: a,bF,  abF\forall a, b \in \mathbb F, \; a \cdot b \in \mathbb F
  7. There is a multiplicative identity: aF\{0}  eF\forall a \in \mathbb F \backslash \{0\} \;\exist e \in \mathbb F denoted 11 such that a1=aa \cdot 1 = a
  8. Multiplicative inverse: aF\{0},bF s.t. ab=ba=1\forall a \in \mathbb F \backslash \{0\}, \exist b \in \mathbb F \text{ s.t. } a \cdot b = b \cdot a = 1 denoted a1=ba^{-1} = b and called the reciprocal of aa
  9. Multiplicative associativity: a,b,cF,  (ab)c=a(bc)\forall a,b,c \in \mathbb F, \; (a \cdot b) \cdot c = a \cdot (b \cdot c)
  10. Multiplicative commutativity: a,bF,  ab=ba\forall a, b \in \mathbb F, \; a \cdot b = b\cdot a
  11. Multiplicative distribution: a,b,cF,  a(a+b)=ab+ac\forall a,b,c \in \mathbb F, \; a\cdot(a + b) = a\cdot b + a\cdot c

The first four requirements imply that elements of F\mathbb F form a group under addition and, with the fifth requirement, commmutativity is obtained. The latter conditions imply that the non-zero elements form a commutative group under multiplication.

Like the shorthand for groups, a field F,+,\langle \mathbb F, + , \cdot \rangle may be referred to as just F\mathbb F, and the order of the field may be subscripted as well: a field with qq elements may be denoted Fq\mathbb F_q.

Example

The field F2=Z2\mathbb F_2 = \Z_2 and has the following operation tables which are of particular importance for binary codes:

XORAND
+01
001
110
x01
000
101

The notion of isomorphism and homomorphism also exist for fields.

Two fields F,FF, \mathcal F are (field) isomorphic if there exists a bijection ϕ:FF\phi: F \rarr \mathcal F such that a,bF\forall a, b \in F:

ϕ(a+b)=ϕ(a)+ϕ(b)\phi(a + b) = \phi(a) + \phi (b)
ϕ(ab)=ϕ(a)ϕ(b)\phi(ab) = \phi(a)\phi (b)

For example, the field {1,1},+,\langle \{-1, 1 \}, +, \cdot \rangle is isomorphic to GF(2)GF(2) ddefined above via

ϕ={0111\phi = \begin{cases} 0 \rarr -1 \\ 1 \rarr 1 \end{cases}

Lastly, fields F,FF, \mathcal F are homomorphic if such a structure-preserving map phiphi exists which is not necessarily bijective.

2.4 - Linear 👏 Algebra 👏 Review

Let VV be a set of elements called vectors and F\mathbb F be a field of element called scalars. The additive operator is defined element-wise between vectors, and scala multiplication is defined such that for a scalar aa and a vector v\mathbf v

vV,  avV\mathbf v \in V, \; a \cdot \mathbf v \in V

Then, VV is a vector space over F\mathbb F if the additive and multiplicative operators satisfy the following conditions:

  1. VV forms a commutative group under addition
  2. For any elements aF,vV,  avVa \in \mathbb F, \mathbf v \in V, \; a \cdot \mathbf v \in V. This also implies by (1) that we must have av+buVa \cdot \mathbf v + b \cdot \mathbf u \in V for all u,vV,a,bF\mathbf{u, v} \in V, a,b \in \mathbb F
  3. Addition and multiplication are distributive: (a+b)v=av+bv(a + b)\cdot \mathbf v = a \cdot \mathbf v + b \cdot \mathbf v and a(u+v)=au+ava \cdot (\mathbf {u +v }) = a \cdot \mathbf u + a \cdot \mathbf v
  4. The multiplicative operation is associative: (ab)v=a(bv)(a \cdot b) \cdot \mathbf v = a\cdot (b \cdot \mathbf v)

Then, F\mathbb F is called the scalar field of the vector space VV.

Examples

  1. The set of nn-tuples (v0,v1,...,vn1)(v_0, v_1, ..., v_{n-1}) together with the elements uiRu_i \in \Reals denoted Rn\Reals^n with element-wise addition:
(v0,v1,...,vn1)+(u0,u1,...,un1)=(v0+u0,v1+u1,...,vn1+un1)(v_0, v_1, ..., v_{n-1}) + (u_0, u_1, ..., u_{n-1}) = (v_0 + u_0, v_1 + u_1, ..., v_{n-1} + u_{n-1})

and scalar multiplication:

a(v0,v1,...,vn1)=(av0,av1,...,avn1)a \cdot (v_0, v_1, ..., v_{n-1}) = (av_0, av_1, ..., av_{n-1})
  1. The set of nn-tuples with elements viF2v_i \in \mathbb F_2 forms a vector space denoted F2n\mathbb F^n_2 with 2n2^n elements. For n=3n =3, the space is completely defined:
(0,0,0)(0,0,1)(0,1,0)(0,1,1)(1,0,0)(1,0,1)(1,1,0)(1,1,1)\begin{aligned} (0,0,0) \quad (0,0,1) \quad (0, 1, 0) \quad (0, 1, 1) \\ (1,0,0) \quad (1,0,1) \quad (1, 1, 0) \quad (1, 1, 1) \end{aligned}
  1. And, in general, the set V=FqnV = \mathbb F^n_q of tuples with the field Fq\mathbb F_q, element-wise addition, and scalar multiplication constitutes a vector space.

A Linear Combination is just the application of these operations in a vector space:

a1v1+a2v2++akvk,  aiF,viVa_1\mathbf{v_1} + a_2\mathbf{v_2} + \cdots + a_k\mathbf{v_k}, \; a_i \in \mathbb F, \mathbf{v_i} \in V

We observe that the linear combination can be obtained by stacking the vectors as columns:

G=[v1v2vk]G = [\mathbf{v_1 v_2 \cdots v_k}]

and taking the prdouct with the column vector of scalar coefficients:

a1v1+a2v2++akvk=G[a1a2ak]a_1\mathbf{v_1} + a_2\mathbf{v_2} + \cdots + a_k\mathbf{v_k} = G \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_k \end{bmatrix}

Spanning Set: Let VV be a vector space. A set of vectors G={v1,v2,...,vk}VG = \{\mathbf{v_1, v_2, ..., v_k} \} \in V is said to be a spanning set if every vector vV\mathbf{v} \in V can be expressed as a linear combination of vectors in GG. That is, vV\forall \mathbf{v} \in V, there exists a set of scalars a1,a2,...,aka_1, a_2, ..., a_k such that a1v1+...+akvka_1\mathbf{v_1} + ... + a_k \mathbf{v_k}. The set of vectors obtained from every possible linear combinations of such vectors in GG is called the span of GG, aptly denoted span(G)span(G).

If we treat GG as a matrix whose columns are the vectors vi\mathbf v_i,

  • span(G)span(G) is the set of linear combinations of the columns of GG
  • the column space of GG is the space obtained by the linear combinations of GG's columns
  • and similarly, the row space of of GG is the space obtained by the linear combinations of GG's rows

If there are redundant vectors in the spanning set, not all are needed to span the space as they can be expressed in terms of other vectors in the spanning set – the vectors of the spanning set are not linearly independent.

A set of vectors v1,v2,...,vk\mathbf v_1, v_2, ..., v_k is said to be linearly dependent if a set of scalars a1,a2,...,aka_1, a_2, ..., a_k exists with not all ai=0a_i = 0 such that their linear combination

a1v1+a2v2++akvk=0a_1\mathbf{v_1} + a_2\mathbf{v_2} + \cdots + a_k\mathbf{v_k} = \mathbf 0

Conversely, a set of vectors v1,v2,...,vk\mathbf v_1, v_2, ..., v_k is said to be linearly independent if a set of scalars a1,a2,...,aka_1, a_2, ..., a_k exists such that

a1v1+a2v2++akvk=0a_1\mathbf{v_1} + a_2\mathbf{v_2} + \cdots + a_k\mathbf{v_k} = \mathbf 0

then it must be the case that a1=a2=...=ak=0a_1 = a_2 = ... = a_k = 0.

The spanning set for a vector space that has the minimum possible number of vectors is caleld the basis for VV. The number of vectors in a basis for VV is said to be the dimension of VV.

It follows that the vectors in a basis must be linearly independent, else it would be possible to form a smaller set of vectors.

example

Let V=F24V = \mathbb F_2^4, the set of binary 4-tuples and let GG be a set of set of vectors with W=span(G)W = span(G)

G=[[1010],[0110],[1100]],W={[0000],[1010],[0110],[1100]}G = \begin{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} \end{bmatrix}, W = \left\{\begin{array}{lr} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} \end{array}\right\}

Observe that GG is a spanning set for WW, but not for all of VV. GG is not a basis for WW as it has some redundancy in it since the third vector is a linear combination of the first two:

[1100]=[1010]+[0110]\begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}

and the vectors in GG are not linearly independent, so the third vector in GG can be removed, resulting in the set:

G=[[1010],[1100]]G' = \begin{bmatrix} \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} \end{bmatrix}

which does have span(G)=Wspan(G') = W.

Notice that no spanning set for WW has fewer vectors in it than GG', so dim(W)=2dim(W)=2.

  • Let VV be a kk-dimensional vector space defined over a finite scalar field. The number of elements in VV, denoted V=qk|V|= q^k. Since every vector vV\mathbf v \in V can be written as
v=a1v1+...+akvk\mathbf v = a_1 \mathbf v_1 + ... + a_k\mathbf v_k

the number of elements in VV is the number of distinct kk-tuples (a1,a2,...,ak)(a_1, a_2, ..., a_k) that can be formed, which is qkq^k.

  • Let VV be a vector space over a scalar field F\mathbb F, and let WVW \subset V be a vector space. For any w1,w2W\mathbf{w_1,w_2} \in W
aw1+bw2Wa \mathbf{w_1} + b \mathbf{w_2} \in W

for any a,bFa,b \in \mathbb F. Then, WW is said to be a vector subspace of F\mathbb F.

  • Let u=(u0,u1,...,un1),v=(v0,v1,...,vn1)\mathbf u = (u_0, u_1, ..., u_{n-1}), \mathbf v = (v_0, v_1, ..., v_{n-1}) be vectors in a vector space VV where ui,viFu_i, v_i \in \mathbb F. The inner product is a function of two such vectors which returns a scalar, expressed as u,v\langle \mathbf u, \mathbf v \rangle and defined:
u,v=uv=i=0n1uivi\langle \mathbf u, \mathbf v \rangle = \mathbf u \cdot \mathbf v = \sum_{i=0}^{n-1} u_i \cdot v_i

from which we can verify the following properties:

  1. Commutativity: uv=vu\mathbf u \cdot \mathbf v = \mathbf v \cdot \mathbf u
  2. Associativity: a(uv)=(au)va \cdot (\mathbf u \cdot \mathbf v) = (a \cdot \mathbf u) \cdot \mathbf v
  3. Distributivity: u(v+w)=uv+uw\mathbf u \cdot (\mathbf v + \mathbf w) = \mathbf u \cdot \mathbf v + \mathbf u \cdot \mathbf w

The inner or dot product is used to describe the notion of orthogonality: two vectors are said to be orthogonal if

uv=0,\mathbf u \cdot \mathbf v = 0,

indicated uv\mathbf u \perp \mathbf v.

Combined with the notion of vector spaces, we get the concept of a dual space. If WW is a kk-dimensional subspace of a vector space VV, the set of all vectors uV\mathbf u \in V which are orthogonal to all vectors of WW is the orthogonal complement, dual space, or null space denoted WW^\perp, given by

W={uuw=0;  u,wV,W}W^\perp = \{ \mathbf u | \mathbf u \cdot \mathbf w = 0; \; \mathbf{u, w} \in V, W\}

Example

Let V=F24V = \mathbb F_2^4 and

W={[0000],[1010],[0110],[1100]}=span({[0001],[1110]})W = \left\{\begin{array}{lr} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix} \end{array}\right\} = span \left( \left\{ \begin{array}{lr} \begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 1 \\ 0 \end{bmatrix} \end{array}\right\} \right)

and dim(W)=2dim(W^\perp) = 2. This example gives way to the following theorem:

Let VV be a finite-dimensional vector space of nn-tuples, Fn\mathbb F^n, with a subspace WW of dimension kk. Let U=WU = W^\perp be the dual of WW, then

dim(W)=dim(V)dim(W)=nkdim(W^\perp) = dim(V) - dim(W) = n - k

Proof

Let g1,g2,...,gkg_1, g_2, ..., g_k be a basis for WW such that G=[g1  g2    gk]G = [g_1 \; g_2 \; \cdots \; g_k], a rank kk matrix such that the dimension of the row and column spaces are both kk. Any vector wW\mathbb w \in W is of the form w=Gx,xFk\mathbf w = G\mathbf x, \mathbf x\in \mathbb F^k.

Any vector uU\mathbf u \in U must satisfy uGx=0;  xFk\mathbf u^\perp G \mathbf x =0; \; \forall \mathbf x \in \mathbb F^k.

Let {h1,h2,...,hr,f1,f2,...,fnr}\{ \mathbf h_1, \mathbf h_2, ..., \mathbf h_r, \mathbf f_1, \mathbf f_2, ..., \mathbf f_{n-r} \} be a basis for w\mathbf w^\perp extended to the whole nn-dimensional space. Every vector v\mathbf v in the row space of GG is expressed as v=bG\mathbf v = \mathbf b^\perp G for some bV\mathbf b \in V.

But since {h1,h2,...,hr,f1,f2,...,fnr}\{ \mathbf h_1, \mathbf h_2, ..., \mathbf h_r, \mathbf f_1, \mathbf f_2, ..., \mathbf f_{n-r} \} spans VV, b\mathbf b must be a linear combination of these vectors:

b=a1h1+a2h2+...+arhr+ar+1hr+1+...+anhnr\mathbf b = a_1\mathbf h_1 + a_2\mathbf h_2 + ... + a_r\mathbf h_r + a_{r+1}\mathbf h_{r+1} + ... + a_n\mathbf h_{n-r}

so a vector v\mathbf v in the row space of GG can be written

v=a1h1TG+...+anfnrTG\mathbf v = a_1 \mathbf h_1^T G + ... + a_n \mathbf f_{n-r}^T G

from which we observe that the row space of GG is spanned by the vectors

{h1TG,h2TG,...,hrTG,f1TG,...,fnrTG}\{ \mathbf h_1^T G, \mathbf h_2^T G, ..., \mathbf h_r^T G, \mathbf f_1^T G, ..., \mathbf f_{n-r}^T G \}

The vectors {h1,h2,...,hr,}\{ \mathbf h_1, \mathbf h_2, ..., \mathbf h_r, \} are in WW^\perp, so hiTG=0\mathbf h_i^TG = \mathbf 0 for i=1,2,...,ri = 1,2, ..., r. The remaining vectors {f1TG,...,fnrTG}\{\mathbf f_1^TG, ..., \mathbf f_{n-r}^TG \} remain to span the kk-dimensional row space of GG. Hence we must have nrkn -r \geq k.

Furthermore, these vectors are linearly independent because if there is a set of coefficients {ai}\{ a_i \} then

a1(f1TG)+...+anr(fnrTG)=0a_1 (\mathbf f_1^T G ) + ... + a_{n-r} (\mathbf f_{n-r}^T G ) = \mathbf 0

but the vectors fi\mathbf f_i are not in WW^\perp, so we must have

a1f1T+...+anrfnrT=0a_1 \mathbf f_1^T + ... + a_{n-r} \mathbf f_{n-r}^T = 0

and since {fi}\{ \mathbf f_i \} are linearly independent, we must have a1=a2=...=anra_1 = a_2 = ... = a_{n-r} therefore it must be the case that

dim  span({f1TG,...,fnrTG=k})dim \; span(\{ \mathbf f_1^TG, ..., \mathbf f_{n-r}^TG = k \})

so nrkn - r \geq k.

3 | Linearly Block Codes

3.1 - Basic Definitions

A [n,k][n, k] Block Code C\mathcal C over an alphabet of qq symbols is a set of qkq^k nn-vectors called codewords or code vectors

  • An nn-tuple is: (c0,c1,...,cn1)An(c_0, c_1, ..., c_{n-1}) \mathcal A^n with nn elements

Associated with the code is an encoder which maps a message, a kk-tuple mAk\mathbf m \in \mathcal A^k, to its associated word. To be correctively useful, we desire a one-to-one correspondence between a message m\mathbf m and its codeword c\mathbf c for a given code C\mathcal C, and there might be several such mappings.

Block codes can be exhaustively represented, but for large kk, this would be prohibitively complex to store and decode. This complexity can be reduced by imposing structure on the code, the most common such structure being linearity.

A block code C\mathcal C over a field Fq\mathbb F_q of qq symbols with length nn and qkq^k codewords is a qq-ary [n,k][n, k] Linear Block Code if and only if its qkq^k codewords form a kk-dimensional vector subspace of the vector space of all the nn-tuples Fqn\mathbb F_q^n.

  • nn is said to be the length of the code
  • kk is the dimension of the code
  • RR is the proportional rate of the code R=k/nR = k/n

For a linear code, the sum of any two codewords is also a codeword. Furthermore, and linear combination of codewords is a codeword.

The Hamming Weight wt(c)wt(\mathbf c) of a codeword c\mathbf c is the number of non-zero components (usually bits) of the codeword.

  • The minimum weight wminw_{min} of a code C\mathcal C is the smallest Hamming weight of any non-zero codeword:
wmin=mincCc0wt(c)w_{min} = \min_{{\mathbf c \in \mathcal C \atop \mathbf c \neq \mathbf 0}} wt(\mathbf c)

For any linear code C\mathcal C, the minimum distance dmind_{min} satisfies dmin=wmind_{min} = w_{min}. That is, the minimum distance of a linear block code is equal to the minimum weight of its non-zero codewords.

This can be proved by translating the difference of two codewords (a linear combination, and thus another codeword) "to the origin":

dmin=minci,cjCcicjdH(ci,cj)=minci,cjCcicjdH(cicj,cjcj)=mincCc0w(c)d_{min} = \min_{{\mathbf c_i, \mathbf c_j \in \mathcal C \atop \mathbf c_i \neq \mathbf c_j}} d_H (\mathbf c_i, \mathbf c_j) = \min_{{\mathbf c_i, \mathbf c_j \in \mathcal C \atop \mathbf c_i \neq \mathbf c_j}} d_H(\mathbf c_i - \mathbf c_j, \mathbf c_j - \mathbf c_j) = \min_{{\mathbf c \in \mathcal C \atop \mathbf c \neq \mathbf 0}} w(\mathbf c)
  • As an aside, I don't love the degeneracy of the notation here, so I'm just going to scrap it and use w(c)w(\mathbf c) to denote weight from here on out

An [n,k][n, k] code with minimum distace dmind_{min} is denoted [n,k,d][n, k, d]. The random error correcting capability of a code with dmind_{min} is

t=(dmin1)/2t = \lfloor (d_{min}-1)/2 \rfloor

3.2 - Generator Matrix Description of Linear Block Codes

Since a linear block code C\mathcal C is a kk-dimensional vector space, there exist kk linearly independent vectors which are designated g0,g1,...,gk1\mathbf g_0, \mathbf g_1, ..., \mathbf g_{k-1} such that every codeword cC\mathbf c \in \mathcal C can be represented as a linear combination of these vectors:

c=m0g0+m1g1+...+mk1gk1,  miFq\mathbf c = m_0\mathbf g_0 + m_1 \mathbf g_1 + ... + m_{k-1}\mathbf g_{k-1}, \; m_i \in \mathbb F_q

Considering \mathbf g_i as row vectors and stacking them, form a k×nk \times n matrix GG:

G=[g0g1gk1]G = \begin{bmatrix} \mathbf g_0 \\ \mathbf g_1 \\ \vdots \\ \mathbf g_{k-1} \end{bmatrix}

and let m=[m0,m1,...,mk1]\mathbf m = [m_0, m_1, ..., m_{k-1}] such that c=mG\mathbf c = \mathbf mG. Thus, every codeword cC\mathbf c \in \mathcal C has a representation for some vector m\mathbf m. Since the rows of GG (generate) span the [n,k][n, k] linear code C\mathcal C, GG gets its name as a Generator Matrix for the code C\mathcal C.

c=mG\mathbf c = \mathbf mG can be understood as the encoding operation for the code C\mathcal C and thus only requires storing kk vectors of length nn (rather than the qkq^k vectors that would be required to store all the codewords of a non-linear code).

Note that the representation provided by GG is not unique, as we can devise another GG' via row operations consisting of non-zero linear combinations such that an encoding c=mG\mathbf c = \mathbf mG' maps m\mathbf m to a codeword in C\mathcal C which is not necessarily the same one obtained by GG.

For the [7,4][7,4] Hamming code (used as an example for the rest of this section), we have the following generator matrix:

G=[1101000011010000110100001101]G = \begin{bmatrix} 1 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 \end{bmatrix}

such that to encode a message m=[1  0  0  1]\mathbf m = [1 \; 0 \; 0 \; 1], we add the first and fourth rows of G(mod2)G (\mod 2) to obtain c=[1  1  0  0  1  0  1]\mathbf c = [1 \; 1 \; 0 \; 0 \; 1 \; 0 \; 1].

Another generator GG' can be used by replacing R1R1+R2R_1 \larr R_1 + R_2 such that c=mG=[1  0  1  0  0  0  1]\mathbf c' = \mathbf m G' = [1 \; 0 \; 1 \; 0 \; 0 \; 0 \; 1] which is a different codeword than the original cC\mathbf c \in \mathcal C.

Let C\mathcal C be an [n,k][n, k] be a (not-necessarily linear) block code. An encoder is systematic if message symbols m0,m1,...,mk1m_0, m_1, ..., m_{k-1} may be found explicitly and unchanged in the codeword such that for coordinates i0,i1,...,ik1i_0, i_1, ..., i_{k-1}. (which are typically sequentially defined: i0,i0+1,...,i0+k1i_0, i_0 + 1, ..., i_0 + k -1) such that ci0=m0,ci1=m1,...,cik1=mk1c_{i_0} = m_0, c_{i_1} = m_1, ..., c_{i_{k-1}} = m_{k-1}.

Note that the systematicism is a property of the encoder, not the code itself. For a linear block code, the encoding operation defined by GG is systematic if an identity matrix can be identified among the rows of GG. (Neither G,GG, G' in the example above are systematic).

A systematic generator is often expressed:

G=[P  Ik]=[p0,0p0,1p0,nk11000p1,0p1,1p1,nk10100p2,0p2,1p2,nk10010      pk1,0pk1,1pk1,nk10001]G = [P \; I_k] = \begin{bmatrix} p_{0,0} & p_{0,1} & \cdots & p_{0, n - k -1} & & 1 & 0 & 0 & \cdots & 0 \\ p_{1,0} & p_{1,1} & \cdots & p_{1, n - k -1} & & 0 & 1 & 0 & \cdots & 0 \\ p_{2,0} & p_{2,1} & \cdots & p_{2, n - k -1} & & 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots & & \;& \;& \;& \vdots & \vdots \\ p_{k-1,0} & p_{k-1,1} & \cdots & p_{k-1, n - k -1} & & 0 & 0 & 0 & \cdots & 1 \\ \end{bmatrix}

where IkI_k is the k×kk \times k identity matrix and pp is an k×(nk)k \times (n-k) matrix which generates parity symbols. The encoding operation is then c=m[P  Ik]=[mP  m]\mathbf c = \mathbf m[P \; I_k] = [\mathbf m P \; \mathbf m].

The codeword is divided into two components:

  • The part m\mathbf m which consists of the message symbols
  • The part mP\mathbf mP which consists of parity check symbols

Elementary row operations do not change the rowspan, so the same code is produced. However, interchanging the columns corresponds to a change in the positions of the message symbols, so the resultant code bits are also swapped, but the distance structure of the code on the whole is preserved.

Two linear codes which are the same except for some permutation of the components of the code are said to be equivalent. Let GG and GG' be generator matrices of equivalent codes. Then G,GG, G' are related by:

  • Columnal permutation
  • Elementary row operations

Given any arbitrary GG, it is possible to put it into the form [P  Ik][P \; I_k] above via Gaussian elimination and pivoting.

Example

For GG of the [7,4][7,4] Hamming code used before, an equivalent generator matrix in systematic form is :

G=[110  1000011  0100111  0010101  0001]G'' = \begin{bmatrix} 1 & 1 & 0 & \; & 1 & 0 & 0 & 0 \\ 0 & 1 & 1 & \; & 0 & 1 & 0 & 0 \\ 1 & 1 & 1 & \; & 0 & 0 & 1 & 0 \\ 1 & 0 & 1 & \; & 0 & 0 & 0 & 1 \\ \end{bmatrix}

and for the Hamming code with this generator, let the message m=[m0,m1,m2,m3]\mathbf m = [m_0, m_1, m_2, m_3] with the corresponding codeword c=[c0,c1,...,c6]\mathbf c = [c_0, c_1, ..., c_6]. Then, the parity bits are obtained via:

c0=m0+m1+m3c1=m0+m1+m2c2=m1+m1+m3\begin{aligned} c_0 = m_0 + m_1 + m_3 \\ c_1 = m_0 + m_1 + m_2 \\ c_2 = m_1 + m_1 + m_3 \\ \end{aligned}

And the systematically encoded bits are:

c3=m0c4=m1c5=m2c6=m3\begin{aligned} c_3 = m_0 \\ c_4 = m_1 \\ c_5 = m_2 \\ c_6 = m_3 \end{aligned}

3.3 - Parity Check Matrix for Dual Codes

Since a linear code C\mathcal C is a kk-dimensional vector subspace of Fqn\mathbb F_q^n, then there must be a dual space to C\mathcal C of dimension nkn-k. The dial space to an [nk][n-k] code C\mathcal C of dim  kdim \; k is the [n,nk][n, n-k] dual code of C\mathcal C, denoted C\mathcal C^\perp.

  • It is possible for a code C=C\mathcal C = \mathcal C^\perp to exist

As a vector space, C\mathcal C^\perp has a basis which can be denoted by {h0,h1,...,hnk1}\{\mathbf h_0, \mathbf h_1, ..., \mathbf h_{n - k -1}\}, with a matrix using these basis vectors as rows:

H=[h0h1hnk1]H = \begin{bmatrix} \mathbf h_0 \\ \mathbf h_1\\ \vdots \\ \mathbf h_{n - k -1} \end{bmatrix}

which is known as the parity check matrix for C\mathcal C. This parity check matric and generator matrix for a code satisfy GHT=0GH^T = \mathbf 0. The parity check matrix also has the following important property:

Let C\mathcal C be a [n,k][n, k] linear code over Fq\mathbb F_q and HH be a parity check matrix for C\mathcal C. A vector vFqn\mathbf v \in \mathbb F_q^n is a codeword if and only if vHT=0\mathbf vH^T = 0. That is, the codewords in C\mathcal C lie in the (left) nullspace of HH.

Example

For the systematic generator GG'' above, the parity check matrix is

H=[100  1011010  1110001  0111]H = \begin{bmatrix} 1 & 0 & 0 & \; & 1 & 0 & 1 & 1 \\ 0 & 1 & 0 & \; & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & \; & 0 & 1 & 1 & 1 \\ \end{bmatrix}

and it can be verified that GHT=0G''H^T = \mathbf 0. Fruthermore, even though GG is not in systematic form, it still generates the same code such that GHT=0GH^T = \mathbf 0. HH is a generator matrix for a [7,3][7, 3] code, the dual to the [7,4][7, 4] Hamming Code.

This condition cHT=0\mathbf cH^T = \mathbf 0 imposes linear constraints among the bits of c\mathbf c called the Parity Check Equations.

The parity check matrix HH above yields

c0+c3+c5+c6=0    c0=c3+c5+c6c1+c3+c4+c5=0    c1=c3+c4+c5c2+c4+c5+c6=0    c2=c4+c5+c6\begin{aligned} c_0 + c_3 + c_5 + c_6 = 0 \implies c_0 = c_3 + c_5 + c_6 \\ c_1 + c_3 + c_4 + c_5 = 0 \implies c_1 = c_3 + c_4 + c_5 \\ c_2 + c_4 + c_5 + c_6 = 0 \implies c_2 = c_4 + c_5 + c_6 \\ \end{aligned}

The parity check matrix for a code (systematic or not) provides information about the minimum distance of the code.

Let a linear block code C\mathcal C have a parity check matrix HH. The minimum distant dmind_{min} of C\mathcal C is equal to the smallest positive number of columns of HH which are linearly dependent. That is, all combinations of dmin1d_{min}-1 columns are linearly independent, so there is some set of dmind_{min} columns which are linearly dependent.

Example

For the parity check matrix HH of GG'', the parity check condition is:

cHT=[c0,c1,...,c6][100010001011111101]=c0[1,0,0]+c1[0,1,0]+c2[0,0,1]+c3[1,1,0]+c4[0,1,1]+c5[1,1,1]+c6[1,0,1]\begin{aligned} \mathbf cH^T &= [c_0, c_1, ..., c_6] \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 0 & 1 \\ \end{bmatrix} \\ &= c_0[1, 0, 0] + c_1[0, 1, 0] + c_2[0, 0, 1] \\ & + c_3[1, 1, 0] + c_4[0, 1, 1] + c_5[1, 1, 1] + c_6[1, 0, 1] \end{aligned}

The first, second, and fourth rows of HTH^T are linearly dependent, and no fewer rows of HTH^T are linearly dependent.

3.3.1 - Simple Bounds on Block Codes

The Singleston Bound: the minimum distance for an [n,k][n, k] linear code is bounded by dminnk+1d_{min} \leq n - k + 1.

Proof: An [n,k][n, k] linear code has a parity check matrix with nkn - k linearly idnependent rows. Since the row rank of a matrix is equal to its column rank, rank(H)=nkrank(H) = n - k. Any collection of nk+1n - k + 1 columns must therefore be linearly dependent. Thus, the minimum distance cannot be larger than nk+1n - k + 1.

  • A code for which the minimum distance is nk1n - k -1 is called a Maximum Distance Separable code.

Hamming Spheres

A roud each code "point" is a cloud of points corresponding to non-codewords. For a qq-ary code, there are (q1)n(q - 1)n vectors at Hamming distance of 1 away from the codeword,

(q1)2(n2)(q-1)^2 {n \choose 2}

vectors at Hamming distance 2, and

(q1)(n)(q - 1)^\ell {n \choose \ell}

vectors at distance \ell from the codeword.

Let C\mathcal C be a code of length n=4n = 4 over GF(3)GF(3), so q=3q = 3. Then the vectors at a Hamming distance of 1 from the codeword [0,0,0,0][0, 0, 0, 0] are:

[1,0,0,0],  [0,1,0,0],  [0,0,1,0],  [0,0,0,1],[2,0,0,0],  [0,2,0,0],  [0,0,2,0],  [0,0,0,2]\begin{aligned} [1, 0, 0, 0], \; [0, 1, 0, 0], \; [0, 0, 1, 0], \; [0, 0, 0, 1], \\ [2, 0, 0, 0], \; [0, 2, 0, 0], \; [0, 0, 2, 0], \; [0, 0, 0, 2] \end{aligned}

The vectors are Hamming distance less than or equal to tt away from a codeword from a sphere "sphere" called the Hamming Sphere of radius tt. The number of codewords in a Hamming sphere up to radius tt for a code length nn over alphabet of qq symbols is denoted Vq(n,t)V_q(n, t).

Vq(n,t)=j=0t(nj)(q1)jV_q(n, t) = \sum_{j = 0}^t {n \choose j}(q - 1)^j

The bounded distance decoding sphere of a codeword is the Hamming sphere of radius

t=(dmin1)/2t = \lfloor (d_{min} - 1) / 2 \rfloor

around that codeword. Equivalently, a code whose random error correction capability is tt must have a minimum distance between codewords satisfying dmin2t+1d_{min} \geq 2t + 1.

The redundancy of a code is the number of parity symbols in a codeword: r=nlogqMr = n - \log_q M where MM is the number of codewords. For a linear code, we have

M=qk    r=nkM = q^k \implies r = n - k

Hamming Bound

A tt-random error correcting qq-ary code C\mathcal C must have redundancy rr satisfying rlogqVq(n,t)r \geq \log_q V_q(n, t).

Proof: Each of MM spheres in C\mathcal C has radius tt with no overlap, esle it would not be possible to decode tt errors. The total number of points enclosed by the spheres must be less than or equal to qnq^n, thus

MVq(n,t)qn    qn/MVq(n,t)MV_q(n, t) \leq q^n \implies q^n/M \geq V_q(n, t)

from which the result follows from taking logq\log_q from both sides.

A code satisfying this bound with equality is said said to be a perfect code, a designation of how the points fall into the spheres rather than the performance of the code. The entire set of perfect codes is:

  1. The set of all nn-tuples with minimum distance of 1, with t=0t = 0
  2. Odd-length binary repetition codes
  3. Linear Binary Hamming codes or other non-linear codes with equivalent parameters
  4. The Golay code [23,12,7][23, 12, 7]

3.4 - Error Detection and Correction Over Hard-Input Channels

Let r\mathbf r be an nn-vector over Fq\mathbb F_q and let HH be a parity check matric for a code C\mathcal C. The vector s=rHT\mathbf s = \mathbf r H^T is called the Syndrome of r\mathbf r. s=0\mathbf{s = 0} if an only if r\mathbf r is a codeword of C\mathcal C, otherwise it indicates the presence of error in the intended codeword r\mathbf r.

3.4.1 - Error Detection

Suppose codeword c\mathbf c in a binary linear block code C\mathcal C over Fq\mathbb F_q is transmitted through a channel and that the nn-vector r\mathbf r is received. We write r=c+e\mathbf{r = c + e}, where e\mathbf e is the error vector hopefully equal to the zero vector. However, the received vector r\mathbf r could be any vector in Fqn\mathbb F^n_q, since any error pattern is possible. If HH is the parity check matrix for C\mathcal C, then the syndrome is given by

s=rHT=(c+e)HT=EHT\mathbf{s = r}H^T = (\mathbf{c + e})H^T = \mathbf E H^T

and s=0\mathbf{s = 0} if r\mathbf r is a codeword. If s0\mathbf{s \neq 0}, then an error has been detected.

3.4.1 - Error Detection: the Standard Array

Using Maximum Likelihood Detection, decoding a vector r\mathbf r consists of selecting the codeword cC\mathbf c \in \mathcal C closest to r\mathbf r in terms of Hamming distance:

c^=argmincCdH(c,r)\hat{\mathbf c} = \arg \min \limits_{\mathbf c \in \mathcal C} d_H(\mathbf{c, r})

Let the set of codewords in the code be c0,c1,...,cM1{\mathbf{c_0, c_1, ..., c}_{M-1}}, where M=qkM = q^k, with c0=0\mathbf{c_0 = 0}, and ViV_i denote the set of nn-vectors closer to ci\mathbf c_i than any other codeword (with ties broken arbitrarily). The set {Vii[0,M1]}\{V_i | i \in [0, M-1] \} partitions the space of nn-vectors into MM disjoint subsets. If r\mathbf r falls into ViV_i, then being closer to ci\mathbf c_i than any other codeword, r\mathbf r is decoded to ci\mathbf c_i, so we simply must define all ViV_i.

The Standard Array is a representation of the partition {Vi}\{V_i\}; a two-dimensional array with columns being the ViV_i. It is constructed by taking every codeword ci\mathbf c_i belonging to its own set ViV_i. Writing down the set of codewords thus gives the first row of the array. From the remaining vectors in Fq2\mathbb F_q^2, find e1\mathbf e_1 of smallest weight which must lie in the set V0V_0 since it is closest to c0=0\mathbf{c_0 = 0}. But

dH(e1+ci,ci)=dH(e1,0);id_H(\mathbf{e_1 + c_i, c_i}) = d_H(\mathbf{e_1, 0}); \forall i

so the vector e1+ci\mathbf e_1 + \mathbf c_i must also lie in V1V_1 for each ii, so e1+ci\mathbf e_1 + \mathbf c_i is placed into each ViV_i. The vectors are included in their respective columns of the standard array to form the second row. We continue this proecdure, selecting an unused vector of minimum weight and addint it to each codeword to form the next row, until all qnq^n possible vectors have been used.

In summary:

  1. Write down all codewords of C\mathcal C
  2. Select from remaining unused vectors of Fqn\mathbb F_q^n one of minimal weight e\mathbf e. Write e\mathbf e in the column under the all-zero codeword, then add e\mathbf e to each codeword in turn, writing the sum in the column under the corresponing codeword
  3. Repeat (2) until all qnq^n vectors in Fqn\mathbb F_q^n have been placed in the array.

Example

For [7,3][7, 3], with

G=[011  1100101  1010110  1001]G = \begin{bmatrix} 0 & 1 & 1 & \; & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & \; & 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & \; & 1 & 0 & 0 & 1 \\ \end{bmatrix}

the codewords are:

R100000000111100101101011001101101001101010101100110001111\begin{array}{cc:cccccc} R_1 & 0000000 & 0111100 & 1011010 & 1100110 & 1101001 & 1010101 & 0110011 & 0001111 \end{array}

from the remaining 77-tuples, one of minimum weight is selected (e.g. 10000001000000), the second row is obtained by adding this to each codeword:

R210000001111100001101001001100101001001010111100111001111\begin{array}{cc:cccccc} R_2 & 1000000 & 1111100 & 0011010 & 0100110 & 0101001 & 0010101 & 1110011 & 1001111 \end{array}

and proceed until all 2n2^n vectors are used, selecting an unused vector of minimum weight and adding it to all codewords:

The Complete Standard Array for [7, 3]

1000000001111001011010110011011010011010101011001100011112100000011111000011010010011001010010010101111001110011113010000000111001111010100011010010011110101001001101011114001000010110001001010111011011110011000101010001100111115000100001101001010010110111011000011011101011101100001116000010001110001011110110001011011011010001011011100010117000001001111101011000110010011010111010111011000100011018000000101111011011011110011111010001010100011001000011109110000010111000111010000011000010010110101101001111011111010100001101100000101001101100111001000010111000111011111110110000000110011010101010110101100111001010000011011111112100100011101000010010010111001000010011101111101110001111301010000010100111001010011101000001111110100110110100111140011000010010010000101111110111000110011010101011001011115100010011110000011110010001001011010010001111011110010111611100001001100010101000101100011001010010110000111111111\begin{array}{cc:ccccccc} 1 & 0000000 & 0111100 & 1011010 & 1100110 & 1101001 & 1010101 & \mathbf{0110011} & 0001111\\ \hline 2 & 1000000 & 1111100 & 0011010 & 0100110 & 0101001 & 0010101 & 1110011 & 1001111 \\ 3 & 0100000 & 0011100 & 1111010 & 1000110 & 1001001 & 1110101 & 0010011 & 0101111 \\ 4 & 0010000 & 1011000 & 1001010 & 1110110 & 1111001 & 1000101 & 0100011 & 0011111 \\ 5 & 0001000 & 0110100 & 1010010 & 1101110 & 1100001 & 1011101 & 0111011 & 0000111 \\ 6 & 0000100 & 0111000 & 1011110 & 1100010 & 1101101 & 1010001 & 0110111 & 0001011 \\ 7 & 0000010 & 0111110 & 1011000 & 1100100 & 1101011 & 1010111 & 0110001 & 0001101 \\ 8 & 0000001 & 0111101 & 1011011 & 1100111 & 1101000 & 1010100 & 0110010 & 0001110 \\ \hline 9 & 1100000 & 1011100 & 0111010 & 0000110 & 0001001 & 0110101 & 1010011 & 1101111 \\ 10 & 1010000 & 1101100 & 0001010 & 0110110 & 0111001 & 0000101 & 1100011 & 1011111 \\ 11 & 0110000 & 0001100 & 1101010 & 1010110 & 1011001 & 1100101 & 0000011 & 0111111 \\ 12 & 1001000 & 1110100 & 0010010 & 0101110 & 0100001 & 0011101 & 1111011 & 1000111 \\ 13 & \mathbf{0101000} & 0010100 & 1110010 & 1001110 & 1000001 & 1111101 & \mathbf{0011011} & 0100111 \\ 14 & 0011000 & 0100100 & 1000010 & 1111110 & 1110001 & 1001101 & 0101011 & 0010111 \\ 15 & 1000100 & 1111000 & 0011110 & 0100010 & 0101101 & 0010001 & 1110111 & 1001011 \\ \hline 16 & 1110000 & 1001100 & 0101010 & 0010110 & 0011001 & 0100101 & 1000011 & 1111111 \\ \end{array}

From which we can draw the following observations:

  1. There are qkq^k codeword columns and qnq^n possible vectors, so there are qnkq^{n-k} rows in the standard array. Therefore, an [n,k][n, k] code is capable of correcting qnkq^{n-k} different error patterns

  2. The difference (sum over GF(2)GF(2)) of any two vectors in the same row is a code vector. In a row, vectors are ci+e\mathbf c_i + \mathbf e and cj+e\mathbf c_j + \mathbf e, which is a codeword since linear spaces form a subspace.

  3. No two vectors in the same row are identical, otherwise ci+e=cj+e\mathbf c_i + \mathbf e = \mathbf c_j + \mathbf e with iji \neq j, which is impossible.

  4. Every vector appears exactly once in the standard array. we know every vector must appear at least once, by construction. If a vector appears in both the ellell-th and mm-th row, we must have

ci+e=cj+em\mathbf c_i + \mathbf e_\ell = \mathbf c_j + \mathbf e_m

and for <m\ell < m, we have em=e+cicj=e+ck\mathbf e_m = \mathbf e_\ell + \mathbf c_i - \mathbf c_j = \mathbf e_\ell + \mathbf c_k for some kk meaning em\mathbf e_m is on the \ell-th row of the array, which is a contradiction.

Rows of the standard array are cosets of the form e+C={e+ccC}\mathbf e + \mathcal C = \{ \mathbf{e + c} | \mathbf c \in \mathcal C \} that is, rowas are translations of the code

  • Vectors in the first column are called coset leaders representing the error patterns that can be corrected by the code under this decoding strategy. The decoder in the example which the table corresponds to can correct:
    • all errors of weight 1,
    • seven errors of weight 2,
    • and 1 error pattern of weight 3

To actually use the standard array, we locate the received error vector r\mathbf r, the identify r=e+c\mathbf{r = e + c} for some error e\mathbf e which is a coset leader (left column) and a codeword c\mathbf c in the top row. Since our standard array cosntruction fills the table in order of increasing error weight, the error codeword is the ML decision.

Example

Let r=[0,0,1,1,0,1,1]\mathbf r = [0,0,1,1,0,1,1]. It's coset leader is e=[0,1,0,1,0,0,0]\mathbf e = [0, 1, 0, 1, 0, 0, 0] (shown in bold in the table) which corresponds to m=[0,1,1]\mathbf m = [0, 1, 1] since the generator is systematic.

Example

Not all (72)=21{7 \choose 2} = 21 patterns of two errors are correctable under this example code and standard array. Only seven patterns of 2, and one error of 3. Thus, the minimum distance is 4 since the weight of non-zero codewords is 4. Thus, the code is only guaranteed to be correct for (41)/2=1\lfloor (4-1)/2 \rfloor = 1 error, but it does correct all of these.

A Complete Error Correcting Decoder is a decoder that, given a received word r\mathbf r, selects codeword c\mathbf c which minimizes dH(r,c)d_H(\mathbf{r, c}). If a standard array is used as the decoding mechanism, then compelte decoding is achieved. On the other hand, if it is filled out such that up to tt instances of errors appear in the table, and all further rows are ommitted, the a bounded distance decoder is obtained.

A tt-error correcting Bounded Distance Decoder selects the codeword c\mathbf c given the recieved vector r\mathbf r if dH(r,c)td_H(\mathbf {r, c}) \leq t. If no such c\mathbf c exists, a decoder failure occurs.

  • E.g., if only rows 1 through 8 in the example standard array above were used, a r\mathbf r in rows 9-16 would result in failure since those rows correspond to error patterns of weight 2 and 3.

A perfect code is one for which there are no "leftover" rows in its standard array; all (nt)n \choose t error patterns of weight tt and all lighter patterns appear as coset leaders. Thus the bounded distance decoder is the ML decoder.

The obvious drawback of the standard array decoder is the memory required to express the error patterns. For a not-unreasonable [256,200][256, 200] binary code C\mathcal C, the standard array would require 22562^{256} vectors of length 256 bits to be stored, and every decoding operation would be O(n)O(n).

The first step towards reducing storage and search complexity (still insufficient to be practical) is Syndrome Decoding. For a vector e+c\mathbf {e + c} in the standard array, the syndrome is s=(e+c)HT=eHT\mathbf{s = (e + c)}H^T = \mathbf eH^T. In fact, every vector in the coset has the same syndrome, so we need only store syndromes and their associated patterns: a lookup table with qnkq^{n-k} rows, but only two columns, necessarily smaller than all but the trivial standard array; but still largely impractical!

Nonetheless, with such a table, we could decode:

  1. Compute the syndrome s=(e+c)HT\mathbf{s = (e + c)}H^T
  2. Lookup the corresponding error pattern e\mathbf e
  3. Then c=r+e\mathbf{c = r + e}

Example

For the [7,4][7, 4] code with parity check matrix

H=[1000011010010100101100001111]H = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 \end{bmatrix}

The syndrome decoding table is :

ErrorSyndrome
0000000000000000000000
1000000100000010001000
0100000010000001000100
0010000001000000100010
0001000000100000010001
0000100000010001110111
0000010000001010111011
0000001000000111011101
1100000110000011001100
1010000101000010101010
0110000011000001100110
1001000100100010011001
0101000\mathbf{0101000}0101\mathbf{0101}
0011000001100000110011
1000100100010011111111
1110000111000011101110

And for r=[0011011]\mathbf r = [0011011], we have s=rHT=[0101]\mathbf{s = r}H^T = [0101], so e=[0101000]\mathbf e = [0101000]. The decoded codeword is then c^=[0011011]+[0101000]=[0110011]\hat{\mathbf c} = [0011011] + [0101000] = [0110011].

Nevertheless, additional algebraic structure must be imposed to reduce the and resource consumption of decoding techniques, as syndrome arrays still fail to sufficiently reduce the overall complexity.

3.5 - Weight Distribution of Codes and Their Duals

Let C\mathcal C be an [n,k][n ,k] code, and AiA_i denote the number of codewords of weight ii in C\mathcal C, then the set of coefficients {A0,A1,...,An1}\{ A_0, A_1, ..., A_{n-1}\} is called the weight distribution for the code, which is conveniently represented as a polynomial:

A(z)=A0+A1z+A2z2++An1znA(z) = A_0 + A_1z + A_2z^2 + \cdots + A_{n-1}z^n

called the weight enumerator, which is basically the zz-transform of the weight distribution sequence.

Example

For the [7,4][7, 4] Hamming code, there is one codeword of weight 0, and the rest have weight 4, so A0=1,A4=7A_0 =1, A_4 = 7: A(z)=1+7z4A(z) = 1 + 7z^4.

The MacWilliams Identity states that, for an [n,k][n, k] linear block code C\mathcal C over Fq\mathbb F_q with weight enumerator A(z)A(z), and B(z)B(z) being the weight enumerator of C\mathcal C^\perp, then:

B(z)=qk(1+(q1)z)nA(1z1+(q1)z)B(z) = q^{-k}(1+(q-1)z)^n A\Bigg(\frac{1 - z}{1 + (q-1)z}\Bigg)

which allows us to characterize the dual of the code, or rearrange to characterize AA:

A(z)=q(nk)(1+(q1)z)nB(1z1+(q1)z)A(z) = q^{-(n-k)}(1+(q-1)z)^n B\Bigg(\frac{1 - z}{1 + (q-1)z}\Bigg)

The proof of this theorem is ugly and so I left it out, but it relies on the Hadamard Transform. For a function ff defined on F2n\mathbb F_2^n, the Hadamard Transfrom f^\hat f of ff is:

f^(u)=vF2n(1)u,vf(v)=vF2n(1)i=0n1uivif(v),  uF2n\hat f (\mathbf u) = \sum_{\mathbf v \in \mathbb F_2^n} (-1)^{\langle \mathbf{u, v} \rangle} f(\mathbf v) = \sum_{\mathbf v \in \mathbb F_2^n} (-1)^{\sum_{i=0}^{n-1}u_iv_i} f(\mathbf v), \; \mathbf u \in \mathbb F_2^n

where the sum is taken over all 2n2^n tuples v=(v0v1,...,vn1),viF2n\mathbf v = (v_0 v_1, ..., v_{n-1}), v_i \in \mathbb F_2^n, but can be generalized to larger fields.

This gives way to the following lemma: if C\mathcal C is an [n,k][n, k] binary linear code, and ff is a function defined over F2n\mathbb F_2^n, then

uCf(u)=1CuCf^(u)\sum_{\mathbf u \in \mathcal C} f(\mathbf u) = \frac{1}{|\mathcal C|} \sum_{\mathbf u \in \mathcal C} \hat f(\mathbf u)

3.6 - Hamming Codes and Their Duals

For any integer m2m \geq 2, a (2m1,2mm1,3)(2^m -1, 2^m - m - 1, 3) binary code may be defined by its m×nm \times n parity check matrix HH, which is obtained by enumerating all possible binary representations of 1,...,n1, ..., n in order. Codes which follow this structure are called Hamming Codes.

Example

For m=4m = 4, we get

H=[101010101010101011001100110011000111100001111000000011111111]H = \begin{bmatrix} 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\ \end{bmatrix}

as the parity matrix for a [15,11][15, 11] Haming code, but the columns can be reordered to be an equivalent code such that the identity matrix is interspersed through HH as the first mm columns:

H=[1000  110110101010100  101101100110010  011100011110001  00001111111]H = \begin{bmatrix} 1 & 0 & 0 & 0 & \; & 1 & 1 & 0 & 1 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & \; & 1 & 0 & 1 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 0 & 1 & 0 & \; & 0 & 1 & 1 & 1 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & \; & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{bmatrix}

and it is clear from this form that for any mm, there exist three columns which add to 00 e.g.,

[1000],[0100],[1100]\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \\ 0 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \\ 0 \\ 0 \end{bmatrix}

so the minimum distance is 33, implying that Hamming codes are capable of correcting one bit error in a block and/or detecting two bit errors.

The dual of an [2m1,2mm1][2^m -1, 2^m - m -1] Hamming code is a [2m1,m][2^m-1, m] code called a simplex code, or Maxmimal-Length Feedback Shift Register code.

Example

The parity check matrix for the [7,4][7, 4] code can be used as the generator for the [7,3][7, 3] simplex code with code words:

0000000  10010110101110  00101111100101  10111000111001  1110010\begin{aligned} 0000000 \; 1001011\\ 0101110 \; 0010111 \\ 1100101 \; 1011100 \\ 0111001 \; 1110010 \\ \end{aligned}

which all, exlcuding the zero-tuple, have weight 4. In general, all codewords of the [2m1,m][2^m -1, m] simplex code have weight 2m12^m -1, and every pair of codewords is at a distance of 2m12^m-1 apart.

Example

For m=2m=2, the codewords {000,101,011,110}\{000, 101, 011, 110\} form a tetrahedron, this the weight enumerator of that dual code is

B(z)=1+(2m1)z2m1B(z) = 1 + (2^m -1)z^{2^{m-1}}

and using the inverse relation, we find that the weight distribution of the original Hamming code is:

A(z)=1n+1[(1+z)n+n(1z)(1z2)(n1)/2]A(z) = \frac{1}{n+1}[(1 + z)^n + n(1-z)(1-z^2)^{(n-1)/2}]

3.7 - Performance of Hamming Codes

There are many axes of measurements for the error detecting and correcting capabilities of codes at the output of the channel decoder:

  • P(E)P(E): probability of decoder error or word error rate - the probability that the codeword at the output of the decoder is not equal to the codeword at the input encoder

  • Pb(E)P_b(E): probability of bit error or bit error rate - the probability that the decoded message bits (extracted from a decoded codeword of a binary code) are no the same as the encoded message bits

  • Pu(E)P_u(E): probability of undetected codeword error

  • Pd(E)P_d(E): probability of detected codeword error - one or more errors detected

  • PubP_{u_b}: probability of undetected bit error rate - the probability that a decoded message bit is in error and is contained within a codeword corrupted by an undetected error

  • PdbP_{d_b}: probability of detected bit error rate - probability that a received message bit is in error and contained within a codeword corrupted by a detected error

  • P(F)P(F): probability of decoder failute - the probability that the decoder is unable to decode the received vector but able to determine that it is unable to decode

3.7.1 - Error Detection Performance

All errors of weight up to dmin1d_{min} -1 can be detected, so for computing the above probabilities, we consider patterns of weight dmind_{min} and above.

If a codeword c\mathbf c of a linear code is transmitted and error pattern happens to be a codeword, e=c\mathbf{e = c'}, then the received vector r=c+c\mathbf{r = c + c'} is also a codeword, hence the error pattern would be undetected. Thus the probability of undetected error is the probability that the error itself is a codeword.

to quantify this possibility, we consider only errors in transmission of binary codes over a BSC with a crossover possibility pp. The probability of any particular pattern of jj errors in a codeword is pj(1p)njp^j(1-p)^{n-j}. Recalling that AjA_j is the number of codewords in a code C\mathcal C of weight jj, the probability that jj errors forms a codeword is Ajpj(1p)njA_jp^j(1-p)^{n-j}, and the probability of undetectable error in a codewrod is then

Pu(E)=j=dminnAjpj(1p)njP_u(E) = \sum_{j = d_{min}}^n A_jp^j(1-p)^{n-j}

The probability of a detected codeword is the probability that one more error occurs minus the probability that the error is undetected

Pd(E)=j=1n(nj)pj(1p)njPu(E)=1(1p)nPu(E)P_d(E) = \sum_{j = 1}^n {n \choose j} p^j(1-p)^{n-j} - P_u(E) = 1 - (1 - p)^n - P_u(E)

Since these probabilities rely on knowledge of the weight distribution of the codes, which is not always available, it is common therefore to provide bounds on performance instead. For example, the upper bound on the probability of undetected error is the probability of occurrences of any patterns of weight meeting or exceeding dmind_{min}. Since there are (nj)n \choose j different ways that jj of nn positions can be changed, we have:

Pu(E)j=dinn(nj)pj(1p)njP_u(E) \leq \sum_{j = d_{in}}^n {n \choose j} p^j(1-p)^{n-j}

A bound on Pd(E)P_d(E) is simply Pd(E)1(1p)nP_d(E) \leq 1 - (1 - p)^n

Example

For a [7,4][7, 4] code with A(z)=1+7z3+7z4+z7A(z) = 1 + 7z^3 + 7z^4 + z^7, we have

Pu(E)=7p3(1p)4+7p4(1p)3+p7P_u(E) = 7p^3(1-p)^4 + 7p^4(1-p)^3 + p^7

If p=0.01p =0.01, then Pu(E)6.79×105P_u(E) \approx 6.79 \times 10^{-5}.

The correspondning bit error rates can be bounded as well. (Imagine being a naughty lil bit error rate uWu 🥵). PubP_{u_b} can be lower bounded by the assumption that undetected codeword error corresponds to only a single message bit error, and upper bounded by the worst possible case that undetected codeword error corresponds to all kk message bits being in error:

1kPu(E)PubPu(E)\frac{1}{k} P_u(E) \leq P_{u_b} \leq P_u(E)

3.7.2 - Error Correction Performance

Recall that an error pattern is correctable if and only if it is a coset leader in the standard array for the code, thus the probability of correction is the probability that the error is a coset leader. Let αi\alpha_i denote the number of coset leaders of weight ii. The set {α0,α1,...,αn1}\{ \alpha_0, \alpha_1, ..., \alpha_{n-1}\} is then the coset leader weight distribtuion. Over a BSC with crossover probability pp, the probability of jj errors forming one of the coset leaders is

αjpj(1p)nj\alpha_jp^j(1-p)^{n-j}

and the probability of a decoding error is thus the probability that the error is not one of the coset leaders:

P(E)=1j=0nαjpj(1p)njP(E) = 1- \sum_{j=0}^n \alpha_j p^j(1-p)^{n-j}

which applies to any linear code with a complete decoder.

For a binary [n,k][n, k] code with weight distribution {Ai}\{A_i\}, the probability of decoding error for a bounded distribution decoder is

P(E)=j=dminnAj=0(dmin)/2PjP(E) = \sum_{j = d_{min}}^n A_j \sum_{\ell = 0}^{\lfloor (d_{min} - )/2 \rfloor} P_\ell^j

The probability of decoder error can be bounded by the probability of _any_error patterns of weight greater than (dmin1)/2\lfloor (d_{min} - 1)/2 \rfloor:

P(E)=jn(nj)pj(1p)njP(E) = \leq \sum_{j \lfloor \bullet \rfloor}^n {n \choose j}p^j(1-p)^{n-j}

3.9 - Modifications to Linear Codes

An [n,k,d][n, k, d] code is extended by adding an additional redundant coordinate, producing an [n+1,k,d+1][n + 1, k, d+1] code.

Example

The parity check matrix for a [7,4,3][7, 4, 3] hamming code is extended with an additional row of check bits that check the parity of all the bits

H=[10010110010111000010111011111111]H = \begin{bmatrix} 1 & 0 & 0 & 1 & 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \end{bmatrix}

The last row is the overall check bit row, and we can put the whole matrix into systematic form via linear operations yielding:

H=[10001101010001110010111000011011],G=[10111000111001000111001011010001],H = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 & 1 \end{bmatrix}, G = \begin{bmatrix} 1 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 1 & 0 & 0 & 0 & 1 \end{bmatrix},

conversely, a code is punctured by deleting one of its parity symbols such that on [n,k][n, k] code becomes an [n1,k][n -1, k] code.

Puncturing an extended code may return it to the original code (if the extension symbols are the ones being punctured). Puncturing can reduce to the weight of each codeword by its weight in the punctured positions. The dmind_{min} of a code is reduced by punctring if the minimum weight codeword is punctured in a non-zero position. Puncturing an [n,k,d][n, k, d] code pp times can result in a code with dmind_{min} as small as dpd - p.

A code is expurgated by deleting some of its codewords. It is possible to expurgate a linear code in such a way that it remains a linear code, but them inimum distance may increase. For example, deleting all odd-weight codewords

A code is augmented by adding new codewords. Addition of new codewordss may make a code non-linear, but the dmind_{min} may decrease as well.

A code is shortened by deleting a message symbol, meaning a row is removed from the generator matrix (corresponding to the removed symbol), and a column is also removed, corresponding to the encoded message symbol such that an [n,k][n, k] code becomes an [n1,k1][n-1, k-1] code.

Finally, a code is lengthened by adding a message symbol, meaning a row and column are added to the generator matrix such that an [n,k][n,k] code becomes an [n+1,k+1][n+1, k+1] code.

4 | Cyclic Codes, Rings, and Polynomials

4.1 - Introduction

In which we introduce methods for deisgning generator matrices which have specific characteristics like minimum distance, and also reducing the prohibitive aspects of long codes and their corresponding standard arrays.

Cyclic codes rely on polynomial operations which allow use to incorporate design specifications. These operations leverage the algebraic structure known as a ring.

4.2 - Basic Definitions

Given a vector c=(c0,c1,...,cn1)GF(q)n\mathbf c = (c_0, c_1, ..., c_{n-1}) \in GF(q)^n, the vector

c=(cn1,c0,c1,...,cn2)\mathbf c' = (c_{n-1}, c_0, c_1, ..., c_{n-2})

is said to be the cyclic shift of c\mathbf c to thr right. A right shift by rr produces

(cnr,cnr+1,...,cn1,c0,...,cnr1)(c_{n-r}, c_{n-r+1}, ..., c_{n-1}, c_0, ..., c_{n-r - 1})

An [n,k][n,k] block code C\mathcal C is said to be cyclic if it is linear and if, for each c=(c0,c1,...,cn1)C\mathbf c = (c_0, c_1, ..., c_{n-1}) \in \mathcal C, its right cyclic shift shift c\mathbf c' is also in C\mathcal C.

Shifting operations can be convenienctly represented using polynomials. For example, the code c=(c0,c1,...,cn1)\mathbf c = (c_0, c_1, ..., c_{n-1}) can be represented by the polynomial:

c(x)=c0+c1x+c2x2++cn1xn1c(x) = c_0 + c_1x + c_2x^2 + \cdots + c_{n-1}x^{n-1}

The Division Algorithm

Let p(x)p(x) be a polymnomial of degree nn and d(x)d(x) be a polynomial of degree mm:

deg(p(x))=n,  deg(d(x))=mdeg(p(x)) = n, \; deg(d(x)) = m

The division algorithm for polynomials assets that there exist polynomials q(x)q(x) (q for quotient) and r(x)r(x) (r for remainder) where 0deg(r(x))m0 \leq deg(r(x)) \leq m and p(x)=q(x)d(x)+r(x)p(x) = q(x)d(x) + r(x). The algorithm itself consists of polynomial long division. We say that p(x)p(x) is equivalent to r(x) modulo d(x)r(x) \text{ modulo } d(x):

p(x)r(x)modd(x);p(x)(modd(x))=r(x)p(x) \equiv r(x) \mod d(x); \quad p(x) (\mod d(x)) = r(x)

If r(x)=0r(x) = 0, then d(x)d(x) divides p(x)p(x): d(x)p(x)d(x) \big| p(x), else d(x)p(x)d(x) \nmid p(x).


A non-cyclic shift is represented by polynomial multiplication: xc(c)=c0x+c1x++cn1xn1xc(c) = c_0x + c_1x + \cdots + c_{n-1}x^{n-1}. for a cyclic shift, we move the coefficient of xnx^n to the constant coefficient position by taking this product modulo xn1x^n - 1. Dividing xc(x)xc(x) by xn1x^n - 1 using the above "algorithm" we get:

xc(c)=cn1quotient(xn1)+(c0x+c1x2++cn2xn1+cn1)remainderxc(c)= \underbrace{c_{n-1}}_{\text{quotient}}(x^n - 1) + \underbrace{(c_0x + c_1x^2 + \cdots + c_{n-2}x^{n-1} + c_{n-1})}_{\text{remainder}}

s.t. the remainder upon dividing by xn1x^n - 1 is

xc(x)(modxn1)=cn1+c0x++cn2xn1xc(x) (\mod x^n -1) = c_{n-1} + c_0x + \cdots + c_{n-2}x^{n-1}

4.3 - Rings

Rings exceed the usefulness of groups which are limited to their single operation; rings have two (count em) operations! A ring R,+,\langle R, +, \cdot \rangle is a set RR with two binary operations ++ addition, and \cdot multiplication, defined on RR s.t.:

  • R,+\langle R, + \rangle is an Abelian (commutative) group with the additive identity as 0.
  • The multiplicative operation \cdot is associative
  • Left and Right laws of distributivity hold:
a(b+c)=ab+aca(b + c) = ab + ac
(a+b)c=ac+bc(a + b)c = ac + bc
  • A ring is said to be commutative if ab=ba  a,bRa \cdot b = b \cdot a \; \forall a, b \in R
  • Like groups, the collective ring may be referred to as just the representative set RR
  • A ring with identity has a multiplication like identity on \cdot, typically defined 11.

Note, however, that the multiplicative operation need not form a group, as there may not be a multiplicative inverse in a ring, even if it has an identity. Some elements over a ring may have a multiplicative inverse, such elements are said to be units.

Example

  • The set of 2×22 \times 2 matrices under usual definitions of addition and multiplication form a ring, though not commutative nor with comprehensive inverses as elements
  • ,+,\langle, +, \cdot \rangle forms a rung, but not a group

Let RR be a ring and aRa \in R. For an integer nn, let nana denote a+a++aa + a + \cdot + a with nn arguments. If a positive integer exists s.t. na=0aRna = 0 \forall a \in R, then the smallest such positive integer is the characteristic of the ring RR. If no such positive integer exists, then RR is said to be a ring of characteristic 00.

Example

In the ring Z6\mathbb Z_6, the characteristic is 6. In the ring Zn,+,\langle \mathbb Z_n, +, \cdot \rangle, the characteristic is nn. In the ring Q\mathbb Q, the characteristic is 0.

4.3.1 Rings of Polynomials

Let RR be a ring. A polynomial f(x)f(x) of degree nn with coefficients in RR is

f(x)=i=0naixif(x) = \sum_{i=0}^n a_i x^i

where an=0a_n = 0. The symbol xx is said to be an indeterminate.

The set of all polynomials with an indeterminate xx with coefficients in a ring RR, using the usual operations for polynomial additions and multiplications, forms a ring called the polynomial ring: R[x]R[x].

Example

Let R=Z6,+,R = \langle \mathbb Z_6, +, \cdot \rangle and let S=R[x]=Z6[x]S = R[x] = \mathbb Z_6[x], then some elements in SS are: 0,1,x,1+x,4+2x,5+4x,etc.0, 1, x, 1 + x, 4 + 2x, 5 + 4x, etc. with example operations:

(4+2x)+(5+4x)=3(4+2x)+(5+4x)=2+2x+2x2\begin{aligned} (4 + 2x) + (5 + 4x) &= 3 \\ (4 + 2x) + (5 + 4x) &= 2 + 2x + 2x^2 \end{aligned}

Example

Z2[x]\mathbb Z_2[x] is the ring of polynomials with coefficients that are either 0 or 1 with operations modeulo 2. An example of arithmetic in this ring is:

(1+x)(1+x)=1+x+x+x2=1+x2(1 + x)(1 + x) = 1 + x + x + x^2 = 1 + x^2

since x+x=0x + x = 0 in this ring.

It is clear that polynomial multiplication does not have an inverse in general, e.g., in the ring of polynomials with coefficients R[x]\Reals[x], there is no polynomial solution f(x)f(x) to f(x)(x2+3x+1)f(x)(x^2 + 3x + 1)

Polynomials can represent a sequence of numbers in a single object. One reason polynomials are of interest is that their multiplication is equivalent to convolution of the sequence a={a0,a1,...,an}\mathbf a = \{a_0, a_1, ..., a_n \} with b={b0,b1,...,bn}\mathbb b = \{b_0, b_1, ..., b_n \}

via the following polynomials:

a(x)=a0+a1x+a2x2++anxnb(x)=b0+b1x+b2x2++bnxn\begin{aligned} a(x) &= a_0 + a_1x + a_2x^2 + \cdots + a_nx^n \\ b(x) &= b_0 + b_1x + b_2x^2 + \cdots + b_nx^n \end{aligned}

and multiplying them: c(x)=a(x)b(x)c(x) = a(x)b(x) s.t. the coefficients of c(x)c(x) are

c(x)=c0+c1x+c2x2++cn+mxn+mc(x) = c_0 + c_1x + c_2x^2 + \cdots + c_{n+m}x^{n+m}

are equal to the values obtained by convolving a,b\mathbf{a,b}.

4.4 - Quotient Rings

Returning to the idea of factor groups: given a group and a subgroup, a set of cosets is formed by "translating" the subgroup. We perform a similar construction over a ring of polynomials, assuming the underlying ring is commutative.

Consider the ring of polynomials GF(2)[x]GF(2)[x] (polynomials with binary coefficients) and the polynomial x31x^3 -1 . In a ring of characteristic 2, xn1=xn+1x^n-1 = x^n + 1. However, in other rhings, the poly should be of the form xn1x^n - 1 only. We divide polynomials up into equivalence classes depending on their remainder modulo x3+1x^3 +1.

E.g., the polynomials in S0={0,x3+1,x4+x,x5+x2,x6+x3,...}S_0 = \{ 0, x^3 + 1, x^4 + x, x^5 + x^2, x^6 + x^3, ... \} all have remainder zero when divided by x3+1x^3 + 1, so we write ,x3+1\langle, x^3 + 1 \rangle as the set generated by x3+1x^3 + 1. The polynomials in S1={1,x3,x4+x+1,x5+x2+1,...}S_1 = \{1, x^3, x^4 + x + 1, x^5 + x^2 + 1, ... \} all have remainder 1 when divided by x3+1x^3 + 1, so we write it S1=1+S0=x3+1S_1 = 1 + S_0 = \langle x^3 + 1 \rangle and so on:

S1={1,x3,x4+x+1,x5+x2+1,x6+x3+1,...}=1+S0S2={x,x3+x+1,x4,x5+x2+x,x6+x3+x,...}=x+S0S3={x+1,x3+x,x4+1,x5+x2+x+1,x6+x3+x+1,...}=x+1+S0S4={x2,x3+x2+1,x4+x2+x,x5,x6+x3+x2,...}=x2+1+S0S7={x2+x+1,x3+x2+x,x4+x2+1,x5+x+1,x6+x3+2+x+1,...}=x2+x+1+S0\begin{aligned} S_1 &= \{1, x^3, x^4 + x + 1, x^5 + x^2 + 1, x^6 + x^3 + 1, ... \} \\ &= 1 + S_0 \\ S_2 &= \{ x, x^3 + x + 1, x^4, x^5 + x^2 + x, x^6 + x^3 + x , ... \}\\ &= x + S_0 \\ S_3 &= \{ x + 1, x^3 + x, x^4 + 1, x^5 + x^2 + x + 1, x^6 + x^3 + x + 1, ...\}\\ &= x + 1 + S_0 \\ S_4 &= \{ x^2 ,x^3 + x^2 + 1, x^4 + x^2 + x, x^5, x^6 + x^3 + x^2,... \}\\ &= x^2 + 1 + S_0 \\ \vdots \\ S_7 &= \{ x^2 + x + 1, x^3 + x^2 + x, x^4 + x^2 + 1, x^5 + x + 1, x^6 + x^3 + 2 + x + 1,... \}\\ &= x^2 + x + 1 + S_0 \end{aligned}

Thus S0,S1,...,S7S_0, S_1, ..., S_7 form the coset of GF(2)[x]\langle GF(2)[x] \rangle modulo the subgroup x3+1x^3 + 1 which exhausts all possible remainders after dividing by x3+1x^3 + 1. So every polynomial in GF(2)[x]GF(2)[x] falls into one of these sets.

We can define induced ring operations for both addition and multiplication which give some large tables similar to those we've seen for other groups modulo their order.

Let R={S0,S1,...,S7}R = \{ S_0, S_1, ..., S_7 \} with such an addition table s.t. R,+\langle R, + \rangle form an Abelian group with S0S_0 as the identity. However, not every element has a multiplicative inverse, so even R\S0,\langle R \text{\textbackslash} S_0, \cdot \rangle does not form a group, but R,+,\langle R, +, \cdot \rangle does form a ring denoted GF(2)[x]/x3+1GF(2)[x]/\langle x^3 + 1 \rangle, the ring of polynomials in GF(2)[x]GF(2)[x] modulo x3+1x^3 + 1. The general denotion of a ring GF(2)[x]/xn1GF(2)[x]/\langle x^n - 1 \rangle be RnR_n, and Fq[x]/xn1\mathbb F_q[x]/\langle x^n - 1\rangle be Rn,qR_{n,q}.

Each equivalence class can be identified by its element of lowest degree:

S0    0,S1    1,S2    xS3    x+1,S4    x2,S5    x2+1S6    x2+x,S7    x2+x+1,  \begin{aligned} &S_0 \iff 0, \quad &S_1 \iff 1, \quad &S_2 \iff x \\ &S_3 \iff x+1, \quad &S_4 \iff x^2, \quad &S_5 \iff x^2+1 \\ &S_6 \iff x^2 + x, \quad &S_7 \iff x^2 + x + 1, &\; \end{aligned}

Let R\mathcal R be the set of these elements of lowest degree with addition and multiplication defined on polynomials modulo x3+1\newline x^3 +1, the R,+,\langle \mathcal R, +, \cdot \rangle forms a ring.

Two rings R,+,,R,+,\langle R, +, \cdot \rangle, \langle \mathcal R, +, \cdot \rangle are isomorphic if there exists a bijective function function ϕ:GG\phi: G \rarr \mathcal G called the isomorphism such that a,bR\forall a, b \in \mathcal R

ϕ(a+b)=ϕ(a)+ϕ(b)ϕ(ab)=ϕ(a)ϕ(b)\begin{aligned} \phi(a + b) &= \phi(a) + \phi(b)\\ \phi(ab) &= \phi(a)\phi(b) \end{aligned}

Similarly, a ring is homomorphic if there exists a function that need not be a complete bijection preserving the above structural behavior.

4.5 - Ideals in Rings