The idea of the one time pad is simple...
To transmit an $n$ bit message $M$, Alice generates an $n$ bit key $k$ at random.
She then somehow distributes $k$ to Bob (possibly the key was already distributed).
The ciphertext is $C = M \oplus k$.
After receiving $C$, Bob decrypts $C$ using $k$ to recover $M$:
decryption = $C \oplus k = M \oplus k \oplus k = M$
The OTP is interesting because it achieves information theoretic security.
This means that the ciphertext $C$ contains no information about $M$ in the absence of $k$.
This is in the sense of "information theory" as developed by Claude Shannon.
His original paper on the OTP is still very accessible: famous paper
The first cryptographer to realize the significance of OTP was Joseph Mauborgne.
The chances that an enemy with infinite computational resources can decrypt $C$ without $k$ are zero.
The reason is that for any possible decrypt $M'$, there is exactly one $k'$ such that
$$ M' = C \oplus k' $$This means that any decrypt is equally likely from Eve's point of view.
With no knowledge of $k$, why would Eve prefer $M$ to $M'$ as a possible decrypt?
She wouldn't... and therefore she can't decrypt $C$ without $k$, even with an infinitely fast computer.
You might notice that OTP is not very widely used. It is used sometimes, but rarely.
What are the disadvantages?
Issues similar to the ones found in reusing the key in OTP come up a lot in cryptography.
What goes wrong?
Suppose $C_1 = k \oplus M_1$ and $C_2 = k \oplus M_2$.
Then if Eve has $C_1$ and $C_2$, she can compute
$$C_1 \oplus C_2 = k \oplus M_1 \oplus k \oplus M_2 = M_1 \oplus M_2.$$If $M_1$ and $M_2$ are long enough and non-random enough, both $M_1$ and $M_2$ can be recovered from $M_1 \oplus M_2$.
The situation is even worse if $k$ is reused more than twice.
Also if $M_1$ is known then finding $M_2$ from $M_1 \oplus M_2$ is easy.
Modern ciphers try to imitate the strengths of OTP without the downsides.
The idea is to shorten the key to something reasonable: 128 or 256 bits.
A small key will then produce a huge pseudo-random key stream.
This means we lose information theoretic security.
But hopefully we can still have computational security.
Most modern ciphers rely on cryptosystems believed to be computationally secure.
This means that while the ciphertext may in principle contain information about the plaintext in the absence of the key, it takes too long to extract.
In practice, for symmetric key cryptography, if a key is $n$ bits we want an exhaustive check of all $2^n$ possible keys to be the only way to learn the plaintext.
Recall that a design goal for a cryptosystem is to produce ciphertext that is indistinguishable from random bits.
If you can crack a cipher then you can definitely distinguish ciphertext from randomness.
So "indistinguishability" is a very strong property: You can't even begin to crack the cipher.
We learned that the random pad in OTP gives us random ciphertext.
So what we need are good random numbers, but a small key.
There are at least 3 types of random number generators:
A common source of cryptographic weakness is to use a PRNG where a CSPRNG is needed.
True random number generators are based on the physical world...
White noise from electrical systems, typing delays, network delays, various jitterings, all pooled together.
The downside is that this is slow.
See this article on the Linux hardware TRNG/CSPRNG.
Usually we use a TRNG to produce a small seed for a PRNG.
A pseudo-random number generator takes a small amount of true randomness called the seed as input.
It then "blows up" this small amount of true randomness to an arbitrary amount of seeming randomness.
PRNGs are used a lot in gaming, simulations, experiments, and code testing.
A PRNG is usually not a CSPRNG.
The C++ rand()
function is a PRNG (and not a CSPRNG).
A cryptographically secure pseudo-random number generator is a PRNG with an extra property.
Roughly, the stream of bits it produces can't be extended without knowing the seed.
Precisely:
Given $s_1,s_2,\ldots,s_n$ bits of output from a CSPRNG it is computationally infeasible to predict the next bit, $s_{n+1}$, with accuracy significantly better than 0.5 (random guessing).
We'll see that most PRNGs emphatically lack this property!
Stream ciphers seek to imitate the OTP by using a CSPRNG.
A TRNG is used to seed the CSPRNG. The key is the seed, possibly together with some parameters of the CSPRNG.
The CSPRNG then generates a stream of pseudo-random bits called the key stream:
$$s_0,s_1,s_2,\ldots$$The plaintext $x_0,x_1,x_2,\ldots$ is then encrypted as $y_0,y_1,y_2,\ldots$, where
$$y_i = s_i + x_i\mod 2$$just as in OTP.
There are two broad approaches to how the keystream might be produced.
A state based approach in which each key stream bit depends on all previous bits.
A counter based approach in which each key stream bit depends on all previous bits.
Note that both of these require a nonce = Number Used Once.
The point of the nonce is key reuse without generating identical keystream.
Suppose that Eve knows some plaintext $x_0,x_1,x_2$ and observes the ciphertext $y_0,y_1,y_2$.
She can then solve for $s_i$ in
$$y_i = s_i + x_i\mod 2$$to get
$$s_i = y_i + x_i\mod 2$$Thus it is realistic to assume Eve knows some of the keystream.
But because we use a CSPRNG, this little bit of stream does not allow Eve to produce more of the stream.
One PRNG that works fine for non-cryptographic applications is the linear congruential generator.
Basic algorithm:
$$S_{n+1} = S_na+c \mod\,m$$Here $m$ is public and $a,c$ are secret.
The seed is $S_0$.
m,a,c = 24,6,2
s = 3
outputs = []
while len(outputs)==len(set(outputs)):
s = (s*a+c)%m
outputs.append(s)
outputs
[20, 2, 14, 14]
It is useful to have a LCG with a long period.
The period is the number of updates before an output is repeated.
The period depends on the values of $m,a,c,$ and $S_0$.
m,a,c = 9,4,1
s = 0
outputs = []
while len(outputs)==len(set(outputs)):
s = (s*a+c)%m
outputs.append(s)
outputs
[1, 5, 3, 4, 8, 6, 7, 2, 0, 1]
It's informative to see how LCG fails as a CSPRNG.
Remember that the important property of a CSPRNG is that without knowledge of the key (including the seed and state variables) the sequence is unpredictable.
It must be unpredictable even if you have already seen some of it.
Suppose we use the above LCG with $m=9, a=4,c=1,$ and $S_0$.
Suppose Eve knows $m$ but not $a,c,$ or $S_0$.
Alice encrypts a message to Bob by using the keystream 1,5,3,4,8,6,7
.
Eve happens to know the first few chunks of plaintext and can use that to determine that the keystream begins 1,5,3
.
Now Eve can write down the equations
\begin{align} 5 &\equiv a\cdot 1 + c \mod 9 \\ 3 &\equiv a\cdot 5 + c \mod 9 \end{align}Simple linear algebra will solve this system of two equations in the two unknowns $a,b$.
Then, knowing $a$ and $b$, Eve can generate the rest of the keystream and decrypt the ciphertext.
Therefore \begin{align} 3 &\equiv a\cdot 5 + 5-a \mod 9 \\ -2 &\equiv a\cdot 4 \mod 9 \\ -2\cdot 7 &\equiv a\cdot 4 \cdot 7 \mod 9 \\ -14 &\equiv 4 \equiv a \mod 9 \end{align} Therefore \begin{align} c &\equiv 5-4 \equiv 1 \mod 9 \\ \end{align}
In cryptography, if the enemy can always solve for your key (even approximately) by doing a bunch of modular linear algebra, that's not good.
Good cryptosystems introduce nonlinearity to stop exactly this kind of thing.
Roughly: A cryptosystem is linear if it only uses XOR operations (gives a linear system mod 2).
Common ways to introduce non-linearity: Logical AND and OR.
Note that AND is multiplication mod 2.
Notice that the attack we just did on LCG would work even if $m,a,$ and $c$ were 1000s of bits long (though you would need to know more of the plaintext).
This may be an example of a cipher that is CTO secure, but not KPA secure.
Another kind of linear PRNG is the LFSR.
While LFSRs are not CSPRNGs, they are often used as subcomponents of nonlinear ciphers.
Examples are ORYX, A5/1, Trivium), and Grain 128-a
The first two of these ciphers are completely broken and the 3rd is thought to be insecure :)
The workings of an LFSR are well suited to hardware. Here is a diagram of a small LFSR:
There is a 16 bit state. During each clock cycle one bit is produced as the state shifts one place to the right.
The XOR of tap bits from the state produce a new bit introduced on the left.
This diagram numbers the bits by significance but it is more convenient to number the slots the other way: in the order in which bits are produced.
Thus 16 should be 0, 14 should be 2, 13 should be 3, and 11 should be 5,
Notice that if the internal state of an LFSR ever repeats, then the LFSR will begin to repeat its values.
The above LFSR has a 16 bit state, so the state can take on at most $2^{16}-1$ nonzero values.
If the all-zero state ever occurs, then the machine produces 0 forever after.
Therefore the period of this LFSR is at most $2^{16}-1$.
In general, the period of an $m$ bit LFSR is at most $2^m-1$.
As the LFSR described above yields bits, they satisfy the following equations.
The outputs $s_{16},s_{15},\ldots,s_1$ are the initial state.
Then
\begin{align} s_{16} &\equiv s_{5}+s_{3}+s_{2}+s_{0}\mod 2 \\ s_{17} &\equiv s_{6}+s_{4}+s_{3}+s_{1}\mod 2 \\ s_{18} &\equiv s_{7}+s_{5}+s_{4}+s_{2}\mod 2 \\ \vdots \\ s_{31} &\equiv s_{20}+s_{18}+s_{17}+s_{15}\mod 2\\ \vdots \\ s_{n} &\equiv s_{n-11}+s_{n-13}+s_{n-14}+s_{n-16}\mod 2 \end{align}The way we described the equations on the previous slide depended on the tap bits of the sample LFSR.
A superior notation uses variables $p_i$ to represent which of the state bits are tapped.
In the above diagram each $p_i \in \{0,1\}$.
The tapped bits in the state of the LFSR are the ones where $p_i = 1$.
For example, suppose $p_0=0,p_1=1,p_2=1,p_3=0,p_4=0$.
Then we get this LFSR:
Using this notation we can represent any $m$ bit LFSR as
\begin{align} s_{m} &\equiv p_{m-1}s_{m-1}+\cdots+p_{1}s_{1}+p_{0}s_{0}\mod 2 \\ s_{m+1} &\equiv p_{m-1}s_{m}+\cdots+p_{1}s_{2}+p_{0}s_{1}\mod 2 \\ s_{m+2} &\equiv p_{m-1}s_{m+1}+\cdots+p_{1}s_{3}+p_{0}s_{2}\mod 2 \\ \vdots \\ s_{m+i} &\equiv p_{m-1}s_{m+i-1}+\cdots+p_{1}s_{i+1}+p_{0}s_{i}\mod 2 \\ \end{align}or
$$s_{m+i} \equiv \sum_{j=0}^{m-1}p_js_{j+i}\mod 2.$$Consider a 5 bit LFSR.
Suppose $p_0=0,p_1=1,p_2=1,p_3=0,p_4=0$ and $s_0 = s_1 = s_3 = 1$ and $s_2 = s_4 = 0$.
What is $s_{10}$?
Each LFSR can be associated with a polynomial
$$P(x) = x^m + p_{m-1}x^{m-1} + \cdots + p_1x+p_0.$$Weirdly an LFSR has maximal period $2^m-1$ if and only if its polynomial is primitive).
A primitive polynomial is a special kind of irreducible polynomial.
The polynomial $x^4+x+1$ is primitive. Its LFSR has period 15.
The polynomial $(1+x)(x+x^2) = x + x^2 + x^2 + x^3 = x+x^3$ is not primitive.
Let's find the period of its LFSR...
Because LFSR is totally linear, you might suspect it's not a CSPRNG.
Suppose that we try to base a stream cipher on an LFSR ( $p_0=0,p_1=1,p_2=1,p_3=0,p_4=0$ and $s_0 = s_1 = s_3 = 1$ and $s_2 = s_4 = 0$).
The secret will be $p_0,p_1,p_2,p_3,$ and $p_4$. We assume that $m=5$ is known.
We will work in the KPA model, where we may assume that Eve knows $s_0,s_1,\ldots,s_9$.
Can Eve discover the secret? Yes!
Eve can write down the following equations.
\begin{align} s_5 &\equiv p_4s_4 + p_3s_3 + p_2s_2 + p_1s_1+p_0s_0 \mod 2 \\ s_6 &\equiv p_4s_5 + p_3s_4 + p_2s_3 + p_1s_2+p_0s_1 \mod 2 \\ s_7 &\equiv p_4s_6 + p_3s_5 + p_2s_4 + p_1s_3+p_0s_2 \mod 2 \\ s_8 &\equiv p_4s_7 + p_3s_6 + p_2s_5 + p_1s_4+p_0s_3 \mod 2 \\ s_9 &\equiv p_4s_8 + p_3s_7 + p_2s_6 + p_1s_5+p_0s_4 \mod 2 \end{align}This gives 5 equations in 5 unknowns which can easily be solved using techniques of linear algebra.
We know from our KPA (and playing with the system, since we know the parameters):
\begin{align} s_0 &= 1 \\ s_1 &= 1 \\ s_2 &= 0 \\ s_3 &= 1 \\ s_4 &= 0 \\ s_5 &= 1 \\ s_6 &= 1 \\ s_7 &= 1 \\ s_8 &= 1 \\ s_9 &= 0 \end{align}Substituting in the $s_i$ gives this:
\begin{align} 1 &\equiv p_4\cdot 0 + p_3 + p_2\cdot 0 + p_1+p_0 \mod 2 \\ 1 &\equiv p_4 + p_3\cdot 0 + p_2 + p_1\cdot 0+p_0 \mod 2 \\ 1 &\equiv p_4 + p_3 + p_2 \cdot 0 + p_1+p_0\cdot 0 \mod 2 \\ 1 &\equiv p_4 + p_3 + p_2 + p_1 \cdot 0+p_0 \mod 2 \\ 0 &\equiv p_4 + p_3 + p_2 + p_1+p_0\cdot 0 \mod 2 \end{align}We can express the system as a matrix: \begin{bmatrix} 1 & 0 & 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 & 0 \\ \end{bmatrix}
Here the columns stand for: constant, $p_4,p_3,p_2,p_1,p_0$, respectively.
We can now solve the system using row reduction mod 2.
Sort by $p_4$.
\begin{bmatrix} 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 & 0 & 1 \\ 0 & 1 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 0 & 1 & 1 \\ \end{bmatrix}Now add equations to eliminate all $p_4$ except the first one...
\begin{bmatrix} 1 & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 1 & 1 & 1 & 1 \\ 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 1 \\ \end{bmatrix}Add equations to clear out beneath $p_3$.
\begin{bmatrix}
1 & 1 & 0 & 1 & 0 & 1 \\
1 & 0 & 1 & 1 & 1 & 1 \\
1 & 0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 1 & 1 \\
1 & 0 & 1 & 0 & 1 & 1 \\
\end{bmatrix}
Simplify by removing the 0 terms:
\begin{align} 1 &\equiv p_3 + p_1+p_0 \mod 2 \\ 1 &\equiv p_4 + p_2+p_0 \mod 2 \\ 1 &\equiv p_4 + p_3 + p_1 \mod 2 \\ 1 &\equiv p_4 + p_3 + p_2 +p_0 \mod 2 \\ 0 &\equiv p_4 + p_3 + p_2 + p_1 \mod 2 \end{align}By additing the last two equations, we see that
$$p_0+p_1 \equiv 1 \mod 2.$$
Applying this to the first equation gives
$$1 \equiv p_3 + 1 \mod 2.$$
This implies that $p_3 =0$.
Using this in the 3rd equation shows $p_1 + p_4 \equiv 1 \mod 2$.
Thus, by the last equation, $p_3 + p_2 = 1$. Then $p_2=1$, since $p_3 =0$.
What if you try to mask the linearity of an LFSR by feeding its output through a nonlinear functin?
This is the idea behind a filtered LFSR.
It turns out filtered LFSRs are vulnerable to many attacks.
What if you substitute a nonlinear function for the XOR in an LFSR?
You get a feedback shift register, but it's no longer linear.
For example: $s_5 = s_1+s_2+s_1s_2 + s_3s_4.$
These have some good properties, but it's hard to know the period.
Grain 128-a attempts to retain the best from both LFSRs and NFSRs: Grain 128-a