CoCalc -- 2018-02-14.ipynb

| Download

This repository contains the course materials from Math 157: Intro to Mathematical Software.

Creative Commons BY-SA 4.0 license.

Project: Support and Testing

Path: math157 / lectures / 2018-02-14 / 2018-02-14.ipynb

Views: ³⁰³⁷
License: OTHER

Kernel: SageMath 8.1

Math 157: Intro to Mathematical Software

UC San Diego, winter 2018

February 14, 2018: Number theory and cryptography (part 2 of 3)

Administrivia:

Schedule change for this week: due to a schedule conflict, my office hours will take place Thursday 3-4 instead of 4-5.

Public-key cryptography

For most of human history, cryptography has involved using a secret key to encrypt a message to keep its contents away from eavesdroppers. The sender would use the secret key (much like a physical key) to transform the message into something unintelligible, and the recipient would use the same key (but possibly in another manner, much like turning the key the opposite way in a lock) to un-transform the message back into its original form.

Starting in the 1970s, it was discovered that one could use ideas from number theory to implement public-key cryptography, in which the information required to encrypt and decrypt messages is not the same! This is more like a mailbox (or homework dropbox): the encryption key can be made public while the decryption key remains safely secret. (One can also invert the paradigm to make digital signatures, where the secret key is required to sign, but a public key is required to decrypt.)

Public key cryptography is now used widely as part of Internet security; for instance, it underlies the https protocol that web sites like CoCalc use to deliver content to your browser. The signature paradigm is also closely related to the concept of a blockchain, which underlies cryptocurrencies and other emerging e-banking protocols.

There are three major public key techniques in use today:

RSA (Rivest-Shamir-Adleman), based on the difficulty of factoring large numbers;
DH (Diffie-Hellman), based on the difficulty of computing discrete logarithms in the ring of integers modulo a large prime $p$ ;
ECDH (elliptic curve Diffie-Hellman), based on the difficulty of computing discrete logarithms in the group of point on an elliptic curve.

While these can all be used as honest cryptosystems (i.e. protocols for exchanging messages securely), they are all are typically used for secure key exchange. Say Alice and Bob want to communicate with each other. (For some reason, examples of cryptography always involve Alice and Bob.)

First, they use the key-exchange protocol to construct a shared secret key $K$ .
Then they use symmetric key cryptography (e.g. AES) to actually communicate. (This typically turns out to be more efficient in practice than directly using a public-key method.)

This talk will focus on constructing (and breaking) Diffie-Hellman key exchange.

Discrete logarithms

Let $p$ be a prime. Last time, we defined the notion of a primitive root modulo $p$ ; this is an integer $a$ such that the multiplicative order of $a$ mod $p$ is equal to the maximum possible value of $p-1$ . In other words, the quantities $1, a, \dots, a^{p-2} \pmod{p}$ exhaust all of the nonzero residue classes modulo $p$ .

Suppose that $a$ is a primitive root mod $p$ . Then for any $m$ not divisible by $p$ , the discrete logarithm of $m$ mod $p$ , with respect to the base $a$ , is the integer $g \in \{0,\dots,p-2\}$ such that $a^g \equiv m \pmod{p}$ .

In [1]:

mod(2, 101)^19

98

In [4]:

2^19 %101

98

In [2]:

mod(98,101).log(mod(2,101))

19

This looks straightforward enough, but it is actually very difficult to compute discrete logarithms! There is no obvious trick for this.

As a result, the function $g \mapsto a^g \pmod{p}$ is often treated as a one-way function: it is easy to compute but hard to invert. (This is a practical definition, not a formal mathematical one, because we (probably) cannot say for sure that it is actually hard to compute, as opposed to our merely being ignorant.)

The difficulty of integer factorization behaves in a similar (but slightly more complicated) way: if $p,q$ are two prime numbers, then it is easy to form the product $pq$ but much harder to recover $p$ and $q$ from the product. We'll see next time how the RSA method derives its security from this state of affairs.

Aside: quadratic residues

Although this does not play a direct role in Diffie-Hellman, this topic arises naturally at this point and is used in another way in the homework, so here it is.

For $n$ a positive integer, an integer $a$ is a quadratic residue mod $n$ if there exists a perfect square congruent to $a$ modulo $n$ .

For example, say $n = 7$ . To test whether a perfect square is congruent to $a$ modulo $n$ , we need only test $0^2, \dots, 6^2$ . Their residues modulo $7$ are $0, 1, 4, 2, 2, 4, 1$ .

In [5]:

quadratic_residues(107) # for any p, will include 0 and (p-1)/2 other values.

[0,
 1,
 3,
 4,
 9,
 10,
 11,
 12,
 13,
 14,
 16,
 19,
 23,
 25,
 27,
 29,
 30,
 33,
 34,
 35,
 36,
 37,
 39,
 40,
 41,
 42,
 44,
 47,
 48,
 49,
 52,
 53,
 56,
 57,
 61,
 62,
 64,
 69,
 75,
 76,
 79,
 81,
 83,
 85,
 86,
 87,
 89,
 90,
 92,
 99,
 100,
 101,
 102,
 105]

For $p$ an odd prime, the list of quadratic residues mod $p$ will include 0 plus $(p-1)/2$ other values. (If we run over $\{0,\dots,p-1\}$ , no residue can occur more than twice: if $x^2 \equiv y^2 \pmod{p}$ , then $(x+y)(x-y)$ is divisible by $p$ , and so one of $x+y$ or $x-y$ must be.)

For $p$ an odd prime and $a$ not divisible by $p$ , $a$ is a quadratic residue modulo $p$ if and only if its discrete logarithm (with respect to any base) is even.

The Legendre symbol $\left( \frac{a}{p} \right)$ is defined to be $0$ if $a \equiv 0 \pmod{p}$ , $+1$ if $a$ is a nonzero quadratic residue modulo $p$ , and $-1$ if $a$ is not a quadratic residue mod $p$ . This symbol has the multiplicativity property: $\left( \frac{a}{p} \right) \left( \frac{b}{p} \right) = \left( \frac{ab}{p} \right).$ (The less-than-obvious part is that the product of two quadratic nonresidues is a quadratic residue. This would fail if $p$ were not prime.)

In [6]:

legendre_symbol(5, 7)

-1

In [1]:

p = next_prime(10^70)
%time legendre_symbol(3, p) ## How is this feasible?? No way we are exhausting over all residue classes mod p!

CPU times: user 132 ms, sys: 5.27 ms, total: 137 ms
Wall time: 155 ms

1

In [9]:

p%4

1

In [8]:

p%3

1

The law of quadratic reciprocity (originally proved by Gauss) states that if $p,q$ are distinct odd primes, then $\left( \frac{p}{q} \right) \left( \frac{q}{p} \right) = (-1)^{(p-1)(q-1)/4}.$ That is, the two Legendre symbols agree if either $p$ or $q$ is congruent to 1 mod 4, and disagree if they are both congruent to 3 mod 4.

If $a$ is not prime, you can apply quadratic reciprocity by first factoring $a$ . But this is not necessary: if one defines the Kronecker symbol to extend the Legendre symbol via the rule $\left( \frac{a}{b_1} \right) \left( \frac{a}{b_2} \right) = \left( \frac{a}{b_1 b_2} \right),$ then the formula $\left( \frac{p}{q} \right) \left( \frac{q}{p} \right) = (-1)^{(p-1)(q-1)/4}.$ extends to all odd positive integers $p,q$ .

This still leaves the prime 2, but there is another formula of Gauss to handle this case: $\left( \frac{2}{p} \right) = (-1)^{(p-1)/8}.$ That is, if $p$ is an odd prime, then 2 is a quadratic residue mod $p$ if $p$ is congruent to 1 or 7 mod 8, and a quadratic nonresidue mod $p$ if $p$ is congruent to 3 or 5 mod 8.

Upshot: you can compute Legendre (and Kronecker) symbols quickly using a form of the Euclidean algorithm!

In [10]:

a = next_prime(10^31)
b = next_prime(10^32)
legendre_symbol(a, b) # Look how fast this is!

-1

Discrete logarithms are hard!

The main difference between the ordinary logarithm and the discrete logarithm is that, while the former is always easy to calculate, the latter is much more difficult. Let's try doing a discrete logarithm on a much larger field.

In [11]:

p = next_prime(5 * 10^10); print(p)
F = GF(p)
e = randint(1, 11)
c = F(2)^e
%timeit c.log(F(2))

50000000021
100 loops, best of 3: 10.2 ms per loop

In [12]:

p = next_prime(5 * 10^20); print(p)
F2 = GF(p)
e = randint(1, (p-1)/2)
c = F2(2)^e
%timeit c.log(F2(2))

500000000000000000003
1 loop, best of 3: 375 ms per loop

In [13]:

p = next_prime(5 * 10^30); print(p)
F2 = GF(p)
e = randint(1, (p-1)/2)
c = F2(2)^e
%timeit c.log(F2(2))

5000000000000000000000000000027
1 loop, best of 3: 950 ms per loop

In [0]:

This is not scaling well. Practical implementations of Diffie-Hellman use 1000 bit primes (or larger), for which discrete logarithms are believed to be extremely difficult.

From discrete logs to Diffie-Hellman

Next, let's use our machinery to construct a Diffie-Hellman key exchange protocol: here are the steps.

Alice and Bob (publicly) agree on a large prime $p$ , modulo which all computations will take place, and a generator $g$ of $GF(p)$ (i.e., a primitive root mod $p$ ).
Alice chooses a secret, random integer $a$ between $0$ and $p-1$ , while Bob chooses a secret random integer $b$ in the same range.
Alice (publicly) transmits $A=g^a\mod p$ to Bob, while Bob (publicly) transmits $B=g^b\mod p$ to Alice.
Alice and Bob both compute $g^{ab}\mod p = B^a\mod p = A^b\mod p$ , and use that value as their shared secret.

Now, if Eve wants to find their shared secret, she has to be able to compute $g^{ab}$ only knowing $g$ , $g^a$ , and $g^b$ . This is called the Diffie-Hellman problem, and is believed to be as hard as computing discrete logarithms in general.

You should have all the tools to make this yourself: please fill in the outline below.

In [16]:

# Public stuff
p = random_prime(2^64)
Fp = GF(p)
g = Fp.multiplicative_generator()

# Alice's private and public keys
a = Fp.random_element() #...
A = pow(g, a, p)

# Bob's private and public keys
b = randint(0, p-2)
B = pow(g, b, p)

# Alice computes the shared secret
K_alice = pow(B, a, p)

# Bob computes the shared secret
K_bob = pow(A, b, p)

# Finally, check that they are the same
print(K_alice == K_bob)

True

And that's it! Alice and Bob have a shared secret, which they can use to send messages.

Attacking Diffie-Hellman: a generic approach

We're going to consider two attacks on Diffie-Hellman key exchange: the first is a brute force attack on the system, while the second is an attack which works if $p$ is a weak Diffie-Hellman prime. In both cases, we're actually going to attack the discrete logarithm problem directly. That is, given $g^a$ for some unknown $a$ , find $a$ .

The simplest brute force attack, of course, is simply exhaustion. That is, just try every value until we find a hit. Let's try this on a particularly small example.

In [2]:

p = previous_prime(2^20)
Fp = GF(p)
g = Fp.multiplicative_generator()

# Alice's public key, based on an "unknown" random number
A = g^(randint(2, p-1))

# Now let's find it
def exhaust(g, A):
    for a in range(2, g.multiplicative_order()):
        if g^a == A:
            return a

%time print(A, g^exhaust(g,A))

(871540, 871540)
CPU times: user 659 ms, sys: 29.7 ms, total: 689 ms
Wall time: 687 ms

Try this now: try the above with a larger prime, say one with 30 bits. How much harder is it?

In [0]:

n = 30
p = random_prime(2^n)
# Fill in the rest here...

There are better brute-force attacks which use memory to reduce the amount of computation required. We will talk about one particular one, called baby-step giant-step, which works as follows:

Compute $m=\lfloor\sqrt{p-1}\rfloor$ .
For $e=0,\ldots,m-1$ , compute and store $g^e\mod p$ along with the index $e$ .
For $f=0,\ldots,m-1$ , compute $A*g^{-fm}\mod p$ , and check if it's in the list above. If it is, stop, and output the pair $(e,f)$ .

If the above produces an output, we know that $g^e \equiv g^{a-fm}\mod p$ , i.e. $e \equiv a-fm\mod p-1$ , i.e. $a \equiv e+fm\mod p-1$ . Moreover, since we can always write $a<p-1$ as $e+fm$ for $e,f < m$ , this algorithm should always output.

Now, let's try it out.

In [3]:

p=4294967291 # a 32-bit prime
Fp = GF(p)
g = Fp.multiplicative_generator()

# Construct random public key as before
A = g^randint(0,p-1)

def bsgs(g, A):
    # Step 1
    m = floor(sqrt(g.multiplicative_order()))
    # Step 2
    small_g_powers = {g^e:e for e in range(m)}
    # Step 3
    gpow = A
    mult = g^(-m)
    for f in range(m):
        if gpow in small_g_powers:
            return small_g_powers[gpow]+f*m
        gpow *= mult

%time print(A, g^bsgs(g, A))

(1786304327, 1786304327)
CPU times: user 307 ms, sys: 28.6 ms, total: 335 ms
Wall time: 357 ms

Note that we were able to break a 32-bit problem faster than the exhaustive approach broke a 20-bit problem. Try going a little higher, say a 40-bit prime.

Attacking Diffie-Hellman: weak primes

Finally, let's look at one attack involving a potential weakness of the prime $p$ , namely that $p-1$ has smaller factors. Assume $p-1 = q_1*q_2$ for $q_1, q_2$ relatively prime factors of size $\approx \sqrt{p}$ . Since $x^{p-1}\mod p = 1$ for any $x$ , $(x^{q_1})^{q_2}\mod p = 1$ for any $x$ . That is, if we only look at elements with are $q_1$ -powers of something, they have order $q_2$ . An analogous thing happens if we switch $q_1$ and $q_2$ .

We use this to recover $a\mod q_1$ and $a\mod q_2$ , from which we can use the Chinese Remainder Theorem to recover $a$ .

Find the discrete logarithm of $A^{q_1}$ with respect to $g^{q_1}$ using one of the algorithms above, i.e. $a_2$ such that $g^{a_2*q_1} = A^{q_1}$ . This implies that $a_2*q_1 \equiv a*q_1\mod q-1$ , i.e. $a_2\equiv a\mod q_2$ .
Find the discrete logarithm of $A^{q_2}$ with respect to $g^{q_2}$ using one of the algorithms above, i.e. $a_1$ such that $g^{a_1*q_2} = A^{q_2}$ . This implies that $a_1*q_2 \equiv a*q_2\mod q-1$ , i.e. $a_1\equiv a\mod q_1$ .
Compute $a = CRT(a_1, a_2, q_1, q_2)$ .

In [2]:

# Let's find a particularly nice p of the appropriate form
q_1 = previous_prime(2**32)
for q_2 in range(2**32-1000, 2**32):
    p = q_1*q_2+1
    if is_prime(p): break
print "p = ", p
print "p-1 = ", factor(p-1)

p =  18446739937656050359
p-1 =  2 * 3^2 * 13^2 * 1411889 * 4294967291

In [3]:

Fp = GF(p)
g = Fp.multiplicative_generator()

# Make Alice's public key as usual
A = g^(randint(1,p-1))

def crt_attack(g, A, q_1, q_2):
    a_2 = bsgs(g^q_1, A^q_1)
    a_1 = bsgs(g^q_2, A^q_2)
    return crt(a_1, a_2, q_1, q_2)

%time
print A, g^crt_attack(g, A, q_1, q_2)

CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.01 µs
12352333673032137476 12352333673032137476

Note that the Chinese remainder theorem works with any number of relatively prime moduli, so we can work with all of the prime factors of $p-1$ independently. In particular, no matter how large $p$ is, if $p-1$ can be factored into small values, we can use this technique to quickly attack the discrete logarithm problem modulo $p$ .

As a result, when choosing a prime $p$ for Diffie-Hellman, it is important to first try to factor $p$ to make sure $p-1$ does not have small prime factors.